Chance News 96

From ChanceWiki
Revision as of 12:33, 18 November 2013 by Cibes (talk | contribs) (→‎Forsooth)
Jump to navigation Jump to search

Quotations

"The world is a messy place. The scientific method imposes some order, but in the case of climate change, that order is probabilistic. For the sake of science and the planet, we should not become distracted by a false sense of certitude. Imprecise truths are the most inconvenient ones."

Gernot Wagner and Martin L. Weitzman, in: Inconvenient uncertainties, New York Times, 10 October 2013

Submitted by Bill Peterson


I am particularly fond of this example [the Linda problem] because I know that the [conjoint] statement is least probable, yet a little homunculus in my head continues to jump up and down, shouting at me—"but she can’t just be a bank teller; read the description.”

Stephen Jay Gould

(See the discussion of the Linda Problem Item 6 below).

Submitted by Bill Peterson


“Once managers start asking themselves ‘what is this distribution?’ instead of ‘what is this number?’, they are in a position to use the various tools … for their own analysis. …. What is needed is a convenient way to pass distributions around ….”

Sam L. Savage, in “Statistical analysis for the masses”
Statistics and Public Policy (1997)

Submitted by Margaret Cibes


“The name MongoDB stems from ‘humongous database’, but chair and co-founder Dwight Merriman says there are more sizes and shapes to data than just ‘big’.”

“Whether your data is big or humongous, it has to work in the cloud”
Siliconrepublic, October 30, 2013

Submitted by Margaret Cibes

Forsooth

“A somewhat comic, though not necessarily typical reaction to the use of sampling by a Federal agency occurred in 1936 when the National Resources Planning Board published its report Consumer Incomes in the United States, …. It showed a highly skewed distribution of income, with the top 10 per cent of the families and single individuals receiving 36 per cent of the income. …. [T]his was the first time that a Federal agency had published such data, and it was based on a sample. The U.S. Chamber of Commerce issued a blast against this report, which it considered it to be socialistic propaganda. It said that the estimates were based on ‘less than a 1 per cent sample, and a random sample at that!’”

Comment by Joe Duncan and Bill Shelton

cited by W. Allen Wallis, “Statistics in Washington, 1935-1945”
Statistics and Public Policy (1997)

(Full text appeared first in Chance, v. 7, no. 4, 1994)

Let's make an XKCD

Thanks to Brian Abend for sending a link to this cartoon from XKCD

Monty hall.png

The Monty Hall problem just won't stay solved, so it was only a matter of time before it was enshrined in XKCD. Here are two other recent appearances in the news:

Stick or switch? Probability and the Monty Hall problem, BBC News Magazine, 11 September 2013
“Readers’ Challenge report: The Monty Hall problem,” Significance, October 2013 (only available online by electronic subscription)

Corrupt Ivy League admissions?

Writing last year, in the American Conservative (The myth of American meritocracy: How corrupt are Ivy League admissions?, 28 November 2012) Rob Unz claimed that today's Asian students were now being discriminated against in ways that Jewish students had been in the past. Namely, despite growing numbers in the population and impressive academic accomplishments, their share of the admissions to top institutions was being restricted by quotas. Furthermore, Unz went on to assert that Jewish students are now actually over-represented relative to equally qualified Asians and non-Jewish whites. Unz's article includes various statistical graphics to support these claims; for example, see

Jewishenrollment-unz.png

At the time of the article, a blog post by Andrew Gelman took the analysis at face value and asked, Should Harvard start admitting kids at random? (28 November 2012). Subsequently, however, Gelman and others were led to re-examine the Unz data, and found that many of the earlier claims do not stand up. For example, instead of inferring Jewishness via a family name, Janet Mertz actually contacted some of the individuals who were on the Math Olympiad team. She writes here that

The actual count of Jews is at least 10¼ out of 78 (counting part-Jews fractionally), i.e., 5-fold higher [than Unz's claim of only two ]. When an author refuses to admit to an error about which there is no possibility he is correct, academics have no choice but to then question the validity of everything that author has ever written because they can no longer trust the veracity of his statements.

Most recently, in a post entitled Ivy Jew update (22 October 2013), Gelman quotes Nurit Baytch:

Unz’s conclusion that Jews are over-admitted to Harvard was erroneous, as he relied on faulty assumptions and spurious data: Unz substantially overestimated the percentage of Jews at Harvard while grossly underestimating the percentage of Jews among high academic achievers.

This latest post by Gelman has a number of interesting quotations, that are worth bearing in mind when looking at any statistical claim:

  • "My take on all this is that it can be harder than it looks to do research using statistics."
  • "It’s perfectly natural to get excited when one’s initial hypothesis is confirmed by an examination of some data, but the next step is to recognize that these exciting discoveries do not always hold up."

Regarding the particular analysis in question, he writes

Unz, who spends so much of his time in the political arena, is used to politically-motivated criticisms and responds in kind, and so I think he sees the statistics provided by Mertz and Baytch as attacks to be dodged or parried rather than as useful information that can help him modify his understanding of the world. But for those of us how are not so invested in a particular position, Baytch’s article, and Mertz’s from a few months ago, should be helpful to anyone interested in further study of ethnicity and high-end college admissions.

In a related post, My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics” (20 February 2013), Gelman takes columnist David Brooks to task for adopting an "anti-data" posture in a NYT column. He observes that Brooks had previously been happy to quote the Unz analysis in a column two months earlier. According to Gelman, "Janet Mertz contacted him and the Times to report that his published numbers were in error, and I also contacted Brooks (both directly and through an intermediary). But no correction has appeared."

Submitted by Paul Alper

Dance of the p-values

Paul Alper sent a link to a wonderful YouTube video, Dance of the p-values. This is an animated simulation--with sound effects keyed to emotional responses--designed to show how erratically the p-value can vary in replications of the same experiment.

Paul found it through Andrew Gelman's blog (7 November) which also features this cartoon:

Marginally significant.jpg

The Linda Problem

Kahneman and Tversky's "Linda Problem" is a famous illustration of the conjuction fallacy:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is of the following is more probable?

  • Linda is a bank teller.
  • Linda is a bank teller and is active in the feminist movement.

From a formal logic perspective, the answer is obviously the first. Yet many people, because of the information about Linda's behavior and outlook on life, choose the second. In his book Gut Feelings: The Intelligence of the Unconscious, Gerd Gigerenzer points out that the reason for the confusion about the conjunction "and" is that a natural language does not work the way of logic. His first example to illustrate the discrepancy is

  • Peggy and Paul married and Peggy became pregnant.
  • Peggy became pregnant and Peggy and Paul married.

His second example is

  • Mark got angry and Mary left.
  • Mary left and Mark got angry.

Obviously, either of the above violates Prob (A and B) = Prob (B and A). The following does not violate Prob (A and B) = Prob (B and A):

  • Verona is in Italy and Valencia is in Spain.
  • Valencia is in Spain and Verona is in Italy.

Further,

Even more surprising, we also know without thinking when and should be interpreted as the logical [inclusive] OR, as in the sentence

We invited friends and colleagues.

One of Gigerenzer's continuing themes is that instead of probability we should concentrate on frequency, which is much easier to understand. He would rephrase Linda to

There are a hundred persons who fit the description above (i.e., Linda's). How many of them are

  • bank tellers?
  • bank tellers and active in the feminist movement?

His empirical claim is that people easily figure things out with this rephrasing.

A Chance News item from a few years back discussed John Allen Paulos's take on the Linda Problem, from his NYT article "Stories and Statisitcs."

Submitted by Paul Alper