Chance News 96: Difference between revisions

From ChanceWiki
Jump to navigation Jump to search
Line 203: Line 203:
In the <i>Science</i> article, Bohannon describes a “sting” operation in which he submitted to 304 <i>open-access journals</i> a “fatally flawed” paper (actually multiple versions) he had written under a pseudonym (with a pseudonym-ous university affiliation).  Apparently, 157 of these journals accepted the paper, 98 rejected it, and the others either did not exist anymore or had not decided.  One accepting journal’s focus was totally unrelated to his topic; another’s charge for publication was $3100.<br>   
In the <i>Science</i> article, Bohannon describes a “sting” operation in which he submitted to 304 <i>open-access journals</i> a “fatally flawed” paper (actually multiple versions) he had written under a pseudonym (with a pseudonym-ous university affiliation).  Apparently, 157 of these journals accepted the paper, 98 rejected it, and the others either did not exist anymore or had not decided.  One accepting journal’s focus was totally unrelated to his topic; another’s charge for publication was $3100.<br>   


He cites Jeffrey Beall, a University of Colorado library scientist, who has an Internet page that names “predatory” publishers.  See [http://www.academia.edu/1151857/Bealls_List_of_Predatory_Open-Access_Publishers# “Beall’s List of Predatory Publishers 2013”] for his criteria and his list of more than 200 open-access journals, many by respected publishers.<br>
He cites Jeffrey Beall, a University of Colorado library scientist, who has an Internet page that names “predatory” publishers.  See [http://www.academia.edu/1151857/Bealls_List_of_Predatory_Open-Access_Publishers# “Beall’s List of Predatory Publishers 2013”] for his criteria and his list of more than 200 open-access journals.<br>


Submitted by Margaret Cibes
Submitted by Margaret Cibes

Revision as of 17:11, 6 December 2013

Quotations

"The world is a messy place. The scientific method imposes some order, but in the case of climate change, that order is probabilistic. For the sake of science and the planet, we should not become distracted by a false sense of certitude. Imprecise truths are the most inconvenient ones."

Gernot Wagner and Martin L. Weitzman, in: Inconvenient uncertainties, New York Times, 10 October 2013

Submitted by Bill Peterson


I am particularly fond of this example [the Linda problem] because I know that the [conjoint] statement is least probable, yet a little homunculus in my head continues to jump up and down, shouting at me—"but she can’t just be a bank teller; read the description.”

Stephen Jay Gould

(See the discussion of the Linda Problem Item 6 below).

Submitted by Bill Peterson


“Once managers start asking themselves ‘what is this distribution?’ instead of ‘what is this number?’, they are in a position to use the various tools … for their own analysis. …. What is needed is a convenient way to pass distributions around ….”

Sam L. Savage, in “Statistical analysis for the masses”
Statistics and Public Policy (1997)

Submitted by Margaret Cibes


“The name MongoDB stems from ‘humongous database’, but chair and co-founder Dwight Merriman says there are more sizes and shapes to data than just ‘big’.”

“Whether your data is big or humongous, it has to work in the cloud”
Siliconrepublic, October 30, 2013

Submitted by Margaret Cibes


From The Big Short, by Michael Lewis, Norton, 2011:

"Above the roulette tables [at The Venetian in Las Vegas], screens listed the results of the most recent twenty spins of the wheel. Gamblers would see that it had come up black the past eight spins, marvel at the improbability, and feel in their bones that the tiny silver ball was now more likely to land on red. That was the reason the casino bothered to list the wheel’s most recent spins: to help gamblers to delude themselves. To give people the false confidence they needed to lay their chips on a roulette table. The entire food chain of intermediaries in the subprime mortgage market was duping itself with the same trick, using the foreshortened, statistically meaningless past to predict the future. " [p. 147]

"Craps offered the player the illusion of control – after all, he rolled the dice – and a surface complexity that masked its deeper idiocy. “For some reason, when these people are playing it they actually believe they have the power to make the dice work,” said [an analyst]. " [pp. 150-151]

Submitted by Margaret Cibes

Forsooth

“A somewhat comic, though not necessarily typical reaction to the use of sampling by a Federal agency occurred in 1936 when the National Resources Planning Board published its report Consumer Incomes in the United States, …. It showed a highly skewed distribution of income, with the top 10 per cent of the families and single individuals receiving 36 per cent of the income. …. [T]his was the first time that a Federal agency had published such data, and it was based on a sample. The U.S. Chamber of Commerce issued a blast against this report, which it considered it to be socialistic propaganda. It said that the estimates were based on ‘less than a 1 per cent sample, and a random sample at that!’”

Comment by Joe Duncan and Bill Shelton

cited by W. Allen Wallis, “Statistics in Washington, 1935-1945”
Statistics and Public Policy (1997)

(Full text appeared first in Chance, v. 7, no. 4, 1994)

Let's make an XKCD

Thanks to Brian Abend for sending a link to this cartoon from XKCD

Monty hall.png

The Monty Hall problem just won't stay solved, so it was only a matter of time before it was enshrined in XKCD. Here are two other recent appearances in the news:

Stick or switch? Probability and the Monty Hall problem, BBC News Magazine, 11 September 2013
“Readers’ Challenge report: The Monty Hall problem”, Significance, October 2013

Corrupt Ivy League admissions?

Writing last year, in the American Conservative (The myth of American meritocracy: How corrupt are Ivy League admissions?, 28 November 2012) Rob Unz claimed that today's Asian students were now being discriminated against in ways that Jewish students had been in the past. Namely, despite growing numbers in the population and impressive academic accomplishments, their share of the admissions to top institutions was being restricted by quotas. Furthermore, Unz went on to assert that Jewish students are now actually over-represented relative to equally qualified Asians and non-Jewish whites. Unz's article includes various statistical graphics to support these claims; for example, see

Jewishenrollment-unz.png

At the time of the article, a blog post by Andrew Gelman took the analysis at face value and asked, Should Harvard start admitting kids at random? (28 November 2012). Subsequently, however, Gelman and others were led to re-examine the Unz data, and found that many of the earlier claims do not stand up. For example, instead of inferring Jewishness via a family name, Janet Mertz actually contacted some of the individuals who were on the Math Olympiad team. She writes here that

The actual count of Jews is at least 10¼ out of 78 (counting part-Jews fractionally), i.e., 5-fold higher [than Unz's claim of only two ]. When an author refuses to admit to an error about which there is no possibility he is correct, academics have no choice but to then question the validity of everything that author has ever written because they can no longer trust the veracity of his statements.

Most recently, in a post entitled Ivy Jew update (22 October 2013), Gelman quotes Nurit Baytch:

Unz’s conclusion that Jews are over-admitted to Harvard was erroneous, as he relied on faulty assumptions and spurious data: Unz substantially overestimated the percentage of Jews at Harvard while grossly underestimating the percentage of Jews among high academic achievers.

This latest post by Gelman has a number of interesting quotations, that are worth bearing in mind when looking at any statistical claim:

  • "My take on all this is that it can be harder than it looks to do research using statistics."
  • "It’s perfectly natural to get excited when one’s initial hypothesis is confirmed by an examination of some data, but the next step is to recognize that these exciting discoveries do not always hold up."

Regarding the particular analysis in question, he writes

Unz, who spends so much of his time in the political arena, is used to politically-motivated criticisms and responds in kind, and so I think he sees the statistics provided by Mertz and Baytch as attacks to be dodged or parried rather than as useful information that can help him modify his understanding of the world. But for those of us how are not so invested in a particular position, Baytch’s article, and Mertz’s from a few months ago, should be helpful to anyone interested in further study of ethnicity and high-end college admissions.

In a related post, My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics” (20 February 2013), Gelman takes columnist David Brooks to task for adopting an "anti-data" posture in a NYT column. He observes that Brooks had previously been happy to quote the Unz analysis in a column two months earlier. According to Gelman, "Janet Mertz contacted him and the Times to report that his published numbers were in error, and I also contacted Brooks (both directly and through an intermediary). But no correction has appeared."

Submitted by Paul Alper

Dance of the p-values

Paul Alper sent a link to a wonderful YouTube video, Dance of the p-values. This is an animated simulation--with sound effects keyed to emotional responses--designed to show how erratically the p-value can vary in replications of the same experiment.

Paul found it through Andrew Gelman's blog (7 November) which also features this cartoon:

Marginally significant.jpg

The Linda Problem

Kahneman and Tversky's "Linda Problem" is a famous illustration of the conjuction fallacy:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is of the following is more probable?

  • Linda is a bank teller.
  • Linda is a bank teller and is active in the feminist movement.

From a formal logic perspective, the answer is obviously the first. Yet many people, because of the information about Linda's behavior and outlook on life, choose the second. In his book Gut Feelings: The Intelligence of the Unconscious, Gerd Gigerenzer points out that the reason for the confusion about the conjunction "and" is that a natural language does not work the way of logic. His first example to illustrate the discrepancy is

  • Peggy and Paul married and Peggy became pregnant.
  • Peggy became pregnant and Peggy and Paul married.

His second example is

  • Mark got angry and Mary left.
  • Mary left and Mark got angry.

Obviously, either of the above violates Prob (A and B) = Prob (B and A). The following does not violate Prob (A and B) = Prob (B and A):

  • Verona is in Italy and Valencia is in Spain.
  • Valencia is in Spain and Verona is in Italy.

Further,

Even more surprising, we also know without thinking when and should be interpreted as the logical [inclusive] OR, as in the sentence

We invited friends and colleagues.

One of Gigerenzer's continuing themes is that instead of probability we should concentrate on frequency, which is much easier to understand. He would rephrase Linda to

There are a hundred persons who fit the description above (i.e., Linda's). How many of them are

  • bank tellers?
  • bank tellers and active in the feminist movement?

His empirical claim is that people easily figure things out with this rephrasing.

A Chance News item from a few years back discussed John Allen Paulos's take on the Linda Problem, from his NYT article "Stories and Statistics."

Submitted by Paul Alper

Flaws in cholesterol risk calculator?

Risk calculator for cholesterol appears flawed
by Gina Kolata, New York Times, 17 November 2013

The article includes an online graphic summarizing the findings:

EstimatingRisk.png

For commentary on a related story, see the post The Economics & Politics of Drugs for Mild Hypertension from HealthNewsReview.org (4 November 2013), which begins

The Cochrane Collaboration’s Hypertension Group published a systematic review of drug treatment for mild hypertension in August 2012 showing no evidence that drugs benefit patients while about 11% have side effects severe enough to stop treatment. As coauthor of that review, I [Dr. David Cundiff] will comment on the economics, politics, regulatory intrigue, financial conflicts, and subsequent media coverage involved.

Submitted by Paul Alper

Update

Bumps in the road to new cholesterol guidelines
by Gina Kolata, New York Times, 25 November 2013

Update on International Year of Statistics

Allie Weinstein provided a newspaper clipping of the following:

Odds lot: Statisticians party like it's 2.013 x 10 cubed
by Daniel Michaels Wall Street Journal, 15 November 2013

The article reports that, now 88% through the International Year of Statistics, the event is viewed as success. It repeats the famous claim, made in 2009 by Google's Hal Varian , that statistics would be "the sexy job in the next 10 years."

The online article links to these videos from

Plotting political ideology

The center cannot hold
by Thomas Edsall, New York Times, 4 December 2012

It's not everyday you encounter a scatterplot in the New York Times. In this column, Edsall presents this plot of political ideology of the 2012 electorate

ElectorateScatterNYT.png

To be continued…

Submitted by Bill Peterson

Human factor in investing

“Losing My Religion”
by Samuel Lee, Morningstar, December 3, 2013

The author discusses how many academics, and their investor-followers, continue to adhere to the “elegant” theoretical “efficient-market hypothesis,” without taking into account the potential effects of “momentum” – investor behavior that may be excessively optimistic/pessimistic – on fund behavior, even in the face of evidence to the contrary.

I’m a slow learner. It took me a while to realize that the sophistication of a study had little to do with its merit. I clued onto this when I began reading the work of John Ioannidis …, who published an influential 2005 paper[1] that argued most published findings are bound to be false. …. His argument is sensible. Publications tend to look for positive results … and ignore negative results. …. I’ve learned to not be overly impressed with a single study or even a series of studies, no matter how credentialed the authors. The data can be tortured to confess to anything. You need to apply liberal doses of common sense—more when the claims are outlandish. A new theory has to be backed by many independent sources of data, ideally data the theory’s originators have never seen, and you need to really kick the tires of any assumptions it makes. …. Risk manager Aaron Brown argues many finance academics would never bet money on their more arcane models—such models are optimized for publication, to show how clever you are, not optimized to say something true about the world.

He describes a 1984 debate between a Rochester professor and Warren Buffet. The professor maintained that the literature had contained no examples of statistically significant skill in investing and described the fund industry as a coin-flipping game, in which someone will experience a long streak of successes given enough flippers. Buffet responded:

[I]magine a national coin-flipping competition with all 225 million Americans. Each morning the participants call out heads or tails. If they’re wrong, they drop out. After 20 days 215 coin-flippers will have called 20 coin flips in a row—literally a one in a million phenomenon for each individual flipper, but an expected outcome given the number of participants. [W]hat if 40 of those coin-flippers came from one place, say, Omaha? That’s no chance. Something’s going on there.

The author concludes, “But the ultimate test of a theory isn’t how credentialed its proponents are or whether it’s published in a prestigious journal, it’s this: Does it have predictive power?”

Submitted by Margaret Cibes

Peer review

“Peer Review”
Nature, December 2006
“Who’s Afraid of Peer Review”
by John Bohannon, Science, October 2013

In the Nature issue, editors describe their trial of an open peer review process, which they started in June 2006, and provide 22 articles solicited from some leading scientists, publishers and others about aspects of the process.

In the Science article, Bohannon describes a “sting” operation in which he submitted to 304 open-access journals a “fatally flawed” paper (actually multiple versions) he had written under a pseudonym (with a pseudonym-ous university affiliation). Apparently, 157 of these journals accepted the paper, 98 rejected it, and the others either did not exist anymore or had not decided. One accepting journal’s focus was totally unrelated to his topic; another’s charge for publication was $3100.

He cites Jeffrey Beall, a University of Colorado library scientist, who has an Internet page that names “predatory” publishers. See “Beall’s List of Predatory Publishers 2013” for his criteria and his list of more than 200 open-access journals.

Submitted by Margaret Cibes