# Chance News 67

## Quotations

“Next time someone tells you they don’t believe a small sample poll can possibly tell you anything, just say to them ‘OK, then. Next time you have to have a blood test, why don’t you ask them to take the whole lot?’”
British opinion pollster Nick Moon
in Significance, March 2010

Submitted by Margaret Cibes

”The greater the number of scores in a sport, the lower the chance for a lucky win by a team that is inferior. …. A sport should have enough scoring—but no more than enough scoring—so that (a) a team that, in a large sample of games, tends to lose to most everyone usually doesn't beat a team that tends to beat everyone, and (b) any one player error or referee call typically will not change the outcome. On this basis, it seems, soccer and hockey have too few scores, basketball and tennis have too many, and baseball and American football are somewhere near the sweet spot.”
Richard Bookstaber, in “The Scoring Problem”
The Wall Street Journal, July 10, 2010

Submitted by Margaret Cibes

"In listening to stories we tend to suspend disbelief in order to be entertained, whereas in evaluating statistics we generally have an opposite inclination to suspend belief in order not to be beguiled."
John Allen Paulos, in Stories vs. statistics
New York Times, 24 October 2010

Submitted by Bill Peterson

## Forsooth

An article describes two brands of athletic wear that are claimed to optimize performance via embedded holograms (Power Balance) and water-soluble titanium (Phiten).

“A lot of these products are a sort of merchandized superstition. …. [A French surfer states,] ‘But if wearing the thing makes you think you feel or perform better, who cares?’”

“Wrist Watch”, TIME, October 4, 2010

Submitted by Margaret Cibes

"A scant 1,391 people live in 91008 ZIP code, and only 12 homes are currently on the market. So a single high-priced listing (like the mammoth nine-bedroom, built this year, that's selling for $19.8 million) is enough to skew the median price skyward." America’s Most Expensive ZIP Codes 2010, Yahoo! Real Estate, September 27, 2010 Note that someone at Forbes must have spotted the potential error in the last 8 words. While the original sentence remains on the Yahoo website, the sentence now ends “may not adequately represent how everyone in the area lives” at the Forbes website[1]. Submitted by Margaret Cibes at the suggestion of Cris Wellington "The relationship between an area's income and mortality is so striking," the report says, "that on average, every$10,000 increase in an area's median income appears to buy its residents another year of life."

Key to long life? It may be in ... your ZIP code
Minneapolis Star Tribune, 7 October 2010

Submitted by Paul Alper

## More fuel to feed the fiery controversy over mammograms

Mammogram Benefit Seen for Women in Their 40s, Gina Kolata, The New York Times, September 29, 2010.

One of the most contentious debates in medicine is whether mammograms are beneficial to women between 40 and 50 years old. Earlier commentaries about this controversy appear in Chance News 8, Chance News 12, Chance News 14, Chance News 47, Chance News 58, and Chance News 59.

The first sentence in the latest article about mammography makes a bold claim...

Researchers reported Wednesday that mammograms can cut the breast cancer death rate by 26 percent for women in their 40s.

...and the second sentence contradicts this claim.

But their results were greeted with skepticism by some experts who say they may have overestimated the benefit.

The data set on which these bold claims were based is quite good.

The new study took advantage of circumstances in Sweden, where since 1986 some counties have offered mammograms to women in their 40s and others have not, according to the lead author, Hakan Jonsson, professor of cancer epidemiology at Umea University in Sweden. The researchers compared breast cancer deaths in women who had a breast cancer diagnosis in counties that had screening with deaths in counties that did not. The rate was 26 percent lower in counties with screening.

Why the skepticism?

One problem, said Dr. Peter C. Gotzsche of the Nordic Cochrane Center in Copenhagen, a nonprofit group that reviews health care research, is that the investigators counted the number of women who received a diagnosis of breast cancer and also died of it. They did not compare the broader breast cancer death rates in the counties.

### Questions

1. The research design in the current study was not randomized. Is this an issue?

2. What are the barriers to conducting a randomized trial for mammography?

## Even more fuel!

This is the way the Swedish mammography study could/should have been analyzed
by Gary Schwitzer, HealthNewsReview Blog, 4 October 2010

Schwitzer's blog (which we first mentioned in Chance News 59) discusses news reports on public health issues, rating the stories according to a set rubric.

His present post concerns the Swedish mammogram study. He reviews the New York Times article described above, as well as reports from the Los Angeles Times, the Associated Press and Health Day. The last is singled out as the only one of the four that fails to make any mention of methodological concerns. However, Schwitzer goes on to argue that none of the articles does an adequate job explaining the methodological issues or their implications for the conclusions of the study. Read the full post for an interesting extended discussion on this.

Question

The discussion in the post notwithstanding, the individual HealthNewsReview ratings cited there give the NYT, the LA Times and AP stories 4 stars, 5 stars, and 5 stars (out of 5) respectively. What do you make of this?

Submitted by Bill Peterson

## Proofiness

Charles Seife is a marvelous writer of serious, interesting topics for the lay reader:

• Zero: The Biography of a Dangerous Idea, 2000
• Alpha & Omega: The Search for the Beginning and End of the Universe, 2004
• Decoding The Universe, 2007
• Sun in a Bottle: The Strange History of Fusion and the Science of Wishful Thinking, 2008

His latest book, Proofiness: The Dark Arts of Mathematical Deception, 2010, makes for especially good reading for students and teachers of statistics. The following web sites all comment on the book: The New York Times has a review and an excerpt; NPR ran a story, Lies, Damned Lies, And 'Proofiness'; additional reviews appeared in New York Journal of Books and Politics Daily.

The reviews are entirely favorable, but don’t quite do justice to his presentation, so readers of Chance News are encouraged to read the book as well as the above commentaries.

Seife defines proofiness as “the art of using bogus mathematical arguments to prove something that you know in your heart is true — even when it’s not.” However, he never makes the connection to Innumeracy

A term meant to convey a person's inability to make sense of the numbers that run their lives. Innumeracy was coined by cognitive scientist Douglas R Hofstadter in one of his Metamagical Thema columns for Scientific American in the early nineteen eighties. Later that decade mathematician John Allen Paulos published the book Innumeracy. In it he includes the notion of chance as well to that of numbers.

Seife also does not refer to Stephen Colbert’s even more famous neologism, truthiness which

is a "truth" that a person claims to know intuitively "from the gut" without regard to evidence, logic, intellectual examination, or facts.

Colbert himself put truthiness this way: "We're not talking about truth, we're talking about something that seems like truth – the truth we want to exist."

Seife begins his Introduction with the famous quotation of Senator Joseph McCarthy on February 9, 1950:

"I have here in my hand a list of 205--a list of names that were made known to the Secretary of State as being members of the Communist party and who nevertheless are still working and shaping policy in the State Department."

The 205 later became 57 and then 81. “It really didn’t matter whether the list had 205 or 57 or 81 names. The very fact that McCarthy had attached a number to his accusations imbued them with an aura of truth.” This “outrageous falsehood was given the appearance of absolute fact.”

Seife attempts to categorize the types of proofiness:

A. Potemkin numbers--numerical facades that look like real numbers such as crowd estimates or the number of communists in the State Department.
B. Disestimation, another neologism--“the act of taking a number too literally, understating or ignoring the uncertainties that surround it.”
C. Fruit packing--“it’s not the individual numbers that are false; it is the presentation of the data that creates the proofiness.”
D. Cherry picking--a form of fruit packing in which there is a “careful selection of data, choosing those that support the argument you wish to make while underplaying or ignoring data that undermine it.”
E. Apples to oranges comparison--another form of fruit packing, for example, comparing dollar amounts without taking into account inflation.
F. Apple polishing--another form of fruit packing, for example, deceptive graphs where the origin is missing; or, algebraically, misuse of mean and median.
G. Causuistry, another neologism and a pun on the word casuistry--“a specialized form of casuistry where the fault in the argument comes from implying that there is a causal relationship between two things when in fact there isn’t any such linkage.”
H. Randumbness, another neologism--“insisting that there is order where there is only chaos” or, “creating a pattern where there is none to see.”
I. Regression to the moon--for example, extrapolating instead of interpolating regression results.

None of these categories are new to teachers of statistics but his examples of the above forms of proofiness are detailed and when not frightening, are amusing; these examples include: the O.J. Simpson trial; the Franken-Coleman Minnesota Senate election and Bush vs. Gore in 2000 (he terms them “electile dysfunctions”); nuclear testing; risk analysis; the space program; the Vietnam war; and, determination of the perfect butt (page 66 contains the formula for callipygianness--a word which is not a neologism). He is particularly incisive when he discusses systematic error when it overwhelms and confuses the notion of error due to sampling, and thus, invalidating the so-called margin of error in polling.

### Discussion

1. If it is so obvious today that McCarthy was fabricating the numbers -- in the parlance of today, he was fact-free -- why was he so successful so long in the 1950s? And why did his allegations and point of view live on well after his death in 1957?

2. Seife devotes a great deal of time to convince the reader that the U.S. census would be more accurate if it did not attempt to count everyone but rather did statistical sampling and avoid many of the systematic errors. Why would this be true? Why did the U.S. Supreme Court deem otherwise?

3. Some of his strongest criticism is directed at journalists and polling organizations. The chapter entitled, “Poll Cats.” On page 120 he says, “Internet polls have no basis in reality whatsoever.” Why? “Yet, CNN.com has an Internet poll on its front page every day.” Again, why? Non Internet polls do not come off much better due to flagrant non-statistical faults.

4. With regard to the O.J. Simpson murder trial, Seife paraphrases one of Simpson's defense attorney's claim that “only one in a thousand wife-beaters winds up murdering his spouse. One in a thousand! Such a small probability means that O.J. Simpson almost certainly isn’t the murderer, right? “ Use Bayes theorem along with reasonable numbers about the number of wives being murdered to indicate that Simpson’s probability of being the culprit is much higher.

5. Regression to the moon also refers to totally nonsensical use of regression. A more detailed look (page 66) at callipygianness reveals

Callipygianness = (S + C) x (B + F) / (T - V) ,

where S is shape, C is circularity, B is bounciness, F is firmness, T is texture, and V is waist-to-hip ratio. Seife found this regression result, not surprisingly, on Fox News and the reporter was from another Murdoch enterprise, The New York Post. Why does Seife find this regression result so ridiculous? On the same page, there is a regression result for “Misery” which depends upon weather, debt, motivation, “the need to take action,” and some other variables. “[I]t proved --scientifically--that the most miserable day of the year [2005] was January 24.” The regression result for “Happiness” appears on the preceding page. Why does Seife claim that these three are examples of Potemkin numbers?

6. To return to McCarthy’s proofiness, his original speech about the 205 communists in the State Department was made in Wheeling, West Virginia to the Republican Women’s Club and made no waves whatsoever for days. Seife does not mention this, but only after the New York Times and the Washington Post publicized the speech did it ignite his fame. Contrast that time lag with today’s instant communication.

7. Seife on page 226 repeats a famous adage of the journalism world: “If your mother says she loves you, check it out.” He then looks at the Pentagon’s weekly body counts and monthly hamlet evaluations during the Vietnam War. By page 228 he describes an auto-industry market research report which shows that driving a Hummer H3 is “better for the environment than driving the energy-efficient Toyota Prius hybrid.” Why did he juxtapose these two examples?

8. The last paragraph of the book is: “Mathematical sophistication is the only antidote to proofiness and our degree of knowledge will determine whether we succumb to proofiness or fight against it. It’s more than mere rhetoric; our democracy may well rise or fall by the numbers.” Why might his “antidote” be insufficient?

Submitted by Paul Alper

## Sampling saliva

“Freshmen Specimen”
by Patricia J. Williams, The Nation, September 27, 2010

In this column, law professor Williams describes reactions to the University of California’s Berkeley project “Bring Your Genes to Cal”, in which 5500 incoming freshmen were asked to provide saliva samples for the purpose of “bring[ing] the student body together in the same manner that reading To Kill a Mockingbird might have in the past.” More than 700 students submitted their samples to an uncertified Berkeley lab, and the samples were analyzed for “susceptibility to alcoholism, lactose intolerance and relative metabolism of folic acid.”

[T]he California Department of Public Health barred the university from dispensing individual profiles on the grounds that genetic analysis is correlative only and is neither necessarily predictive nor diagnostic at this point. A collective comparison of the class's genetic data was permitted, however, and circulated in "anonymized" form at orientation.

Some ethical issues that have been raised include:
(a) privacy, despite the “anonymizing” of results;
(b) ownership of the data with respect to commercialization, patentability, remuneration, etc.;
(c) promotion of the concept that a genetic correlation is a “100 percent infallible guarantee” of anything;
(d) motive with respect to promoting sales of swab kits.

The article refers to a Stanford University medical school class “spit party” and to a University of Minnesota "Gopher Kids" program (free gifts for saliva swabs at a state fair).

Readers might be interested in a paper from ETC Group, a Canadian-based international organization, "Direct-to-Consumer DNA Testing and the Myth of Personalized Medicine: Spit Kits, SNP Chips and Human Genomics". Or they might want to google "spit party" to see how widespread these activities are.

### Questions

1. Explain what the Public Health Department meant by the clause "genetic analysis is correlative only."

2. Comment on the following statement in the article: “The university advertises participation as altruistic, a contribution to public health and human knowledge.”

3. The author of the article refers to the process of collecting saliva samples as a “commodity exchange.” Do you agree with the author?

Submitted by Margaret Cibes

## Correlation as investment tool

“Why the Math of Correlation Matters”
by Jonnelle Marte, The Wall Street Journal, October 4, 2010

This article discusses how a mutual-fund investor might employ the concept of correlation in aiming to diversify, and/or reduce volatility in, an investment portfolio.

“If your investments move in lock step, or are highly correlated, "you'll either be all right or all wrong," says [an equity market strategist].

It describes how correlation is measured in comparing two investments:

A correlation close to zero means the performance of one asset has little or no connection to that of the other. A correlation of 1 is a perfect positive correlation, meaning the two assets always move in sync—in the same direction, and at a scale that doesn't vary. For instance, Asset A will always move at twice the magnitude of Asset B. A correlation of minus 1 is a perfect negative correlation. The assets move in opposite directions at a scale that doesn't vary.

And it points out that daily, weekly, or monthly returns can be compared, and provides a table of correlations of various assets to the S&P 500 over a 10-year period. At the extremes are a +0.89 correlation between international stocks and the S&P, versus a -0.39 correlation between intermediate U.S. bonds and the S&P.

There are two caveats. The term of the analysis is a key consideration; for example, the 10-year correlation of -0.39 referred to above became a +0.08 correlation for an over-80-year period. Also, a crisis, such as the 2008 “crash,” may result in a “surge” in correlations, when investments of all kinds decreased.

The author states that investors may choose assets that are uncorrelated, or negatively correlated, to the S&P 500, in order to balance, or minimize, risk.

Such strategies are only recommended in the short term because they essentially cancel out returns. Holding too many negatively correlated assets can be a little like trying to hit the gas while slamming on the brakes, says [one financial analyst].

Interested readers are directed to the website "Asset Correlations", where they will find a table of correlations between pairs of asset categories, or they may create their own tables.

Two bloggers commented[2]:

(a) "This is NOT the way correlation works!!
"A correlation of negative one does NOT mean that when asset class A returns 5%, asset class B returns negative 5%. It means that when asset class A returns greater than its expected return (say expected return of 5% and A is returning 7%), then asset class B will be return less than its expected return (if asset class B also has an expected return of 5%, then it would be returning 3%)."

(b) "Good point. To summarize, correlation indicates if variables tend to move in the same direction, but gives no indication about the amplitude of these movements."

Submitted by Margaret Cibes

## Money isn’t everything, at least in baseball

“The Year Money Didn’t Matter”
by Matthew Futterman, The Wall Street Journal, September 16, 2010

This article reports that the correlation between Major League Baseball player payrolls and games won will be at its lowest level (0.14) since the 1994 players’ strike, if “current standings hold up through the end of the season.” And it contains a graph of correlations for the period 1995-2010 to date.

While all eight teams reaching the playoffs had among the 10 top payrolls in 1999, only three of the highest payroll teams – but four of the lowest – will probably make the 2010 playoffs, if standings hold up through the end of this season.

Despite the fact that top and bottom payrolls have grown farther apart in dollars, one factor in the current situation may be the 2002 revenue-sharing agreement, by which wealthier ball clubs now share increasing amounts of revenue with poorer clubs. Some of the revenue-receiving poorer teams have invested in non-payroll expenses such as scouting, trades, etc., with resulting improvements in performance, while some of the revenue-contributing teams’ performances have been constrained by long-term contracts with under-performing players, as well as by player injuries this year.
See more data:
(a) Forbes blog on 2010 baseball costs per win in “Baseball’s Most and Least Efficient Teams for the 2010 Season”
(b) New York Times chart of payrolls vs. win-loss records over the period 2001-2010 in “Putting a Price Tag on Winning”
(c) ESPN chart and table of payrolls vs. win-loss records over the period 1998-2008 in “The Biz: The Price of Winning”

Submitted by Margaret Cibes

## Medical misinformation

Lies, damned lies, and medical science
by David H. Freedman, The Atlantic, November 2010

To be continued...

## World Statistics Day

See the U.S. Census Bureau website[3] for videos and other information related to the first World Statistics Day: October 20, 2010.

Note that the date, written in day/month/year format, is 20/10/2010. It will be interesting to see how the day is chosen in year 2013 or subsequent years.

Submitted by Margaret Cibes

## Racial disparity in Wikimedia Commons photos

A short article was published October 20 on Examiner.com, drawing attention to a racial disparity found in two distinct sections of freely-licensed visual content published at Wikimedia Commons (a sister site of Wikipedia). While the subject matter of the photos may make some uncomfortable, the parent Wikimedia Foundation did in fact hire a consultant to evaluate the situation from an independent perspective.

What is boils down to is that the consultant said he evaluated 1,000 images of male sex organs that are found on Wikimedia Commons, and (by his count) not a single one was of a non-white male. In another (much smaller) category of photos and illustrations called "topless adolescent girls", some 25 of the 26 images portray non-white subjects.

As the Examiner article asks, is this a tacit form of racism? The odds that these two categories coincidentally ending up 99.9% white and 96.2% non-white, respectively, seem too hard to imagine.

### Discussion

1. What other factors might explain why a racial disparity is found in these categories?
2. Do you see any problematic factors in how a collection of encyclopedic images are gathered, when there is no editorial board guiding acquisition?

Submitted by Gregory Kohs

==Age has its rewards?== “Trust the Wisdom of Older Managers”
by David Biderman, The Wall Street Journal, August 4, 2010

The article provides a table of baseball managers’ age ranges and corresponding average winning percentages. Data is taken from the records of anyone who has managed since 2000 and has had a minimum of 5 years managing. There were 44 such managers over 539 seasons.

Age: Average winning percentage
35-37: 0.477
38-40: 0.474
41-43: 0.493
44-46: 0.506
47-49: 0.499
50-52: 0.491
53-55: 0.492
56-58: 0.537
59-61: 0.523
62-64: 0.541
65+: 0.515

### Questions

1. How do you think that the author counted “539 seasons”?

2. What is the average number of managers in each age category? Do you think that there would be enough managers in each age category to do a statistical comparison of the average winning percentages?

3. The correlation between age and average winning percentage is about 0.8 (using age interval midpoints, and 70 for the oldest category). Suppose that you knew the average number of years managed for each age category. How would you expect its correlation with average winning percentage to compare to 0.8 – weaker, the same, stronger?

4. Explain the author’s statements: “This idea that managers get better with age, though, might be somewhat self-fulfilling. Managers who perform poorly in their youth aren't able to drag down older managers' average[s] since they probably got canned.”

Submitted by Margaret Cibes

## Statisticians’ arithmetic

“Magic by Numbers”
by Daniel Gilbert, The New York Times, October 16, 2010

The author cites a 2006 randomized double-blind study, “Effectiveness of discontinuing antibiotic treatment after three days instead of eight days in mild to moderate-severe community acquired pneumonia”. The study concluded that the three-day period was "not inferior" to the eight-day period.

A New Zealand medical student responded to the study with a number of criticisms[4], including one about the researchers’ arithmetic:

“In the per protocol analysis the cure rates were 93% (50/54) in the three day treatment group compared with 93% (56/60) in the eight day treatment group (difference 0.1% ….”

### Questions

1. Do you agree with the “difference” stated in the report?
2. Can you suggest an arithmetic reason for the report’s 0.1% figure, assuming that it's not a typographical error?

Submitted by Margaret Cibes