Chance News 80

From ChanceWiki
Jump to navigation Jump to search

December 23, 2011 to January 17, 2012


“The role of context. .... The focus on variability naturally gives statistics a particular content that sets it apart from from mathematics itself and from other mathematical sciences, but there is more than just content that distinguishes statistical thinking from mathematics. Statistics requires a different kind of thinking, because data are not just numbers, they are numbers with a context. .... In mathematics, context obscures structure. .... In data analysis, context provides meaning. .... [A]lthough statistics cannot prosper without mathematics, the converse fails.”

"What About Probability? .... In the ideal Platonic world of mathematics, we can start with a probabilistic chicken and use deductive logic to lay a statistical egg, but in the messier world of empirical science, we must start with the egg as observed data and construct a prior probabilistic chicken as an inference."

George Cobb and David Moore (authors' emphasis)

in “Mathematics, Statistics, and Teaching”

The American Mathematical Monthly, 1997

Submitted by Margaret Cibes

"[W]hile disciplines like physics or psychology or statistics discard projects and methodologies no longer regarded as cutting edge, if you like the way literary studies were done in 1950 or even 1930, there will be a department or a journal that allows you to proceed as if nothing had happened in the last 50 or 75 years."

Stanley Fish, writing in The old order changeth, New York Times, 26 December 2011

Submitted by Paul Alper

"Apparently unsatisfied with the evidence that each study provides on its own, Bem et al. (2011) resort to the last refuge of the refuted: consider all experiments simultaneously."

Eric–Jan Wagenmakers, writing in Yes, psychologists must change the way they analyze their data:
Clarifications for Bem, Utts, and Johnson (2011)

Submitted by Paul Alper

“Statistical significance should be a tiny part of an inquiry concerned with the size and importance of relationships.” [p. 2]

“[Statistical significance] is a philosophical, qualitative test. It does not ask how much. It asks 'whether.' …. [S]ome of the putatively quantitative sciences have slipped into asking qualitatively whether there exists an effect …. Yes or no, they say, and then they stop. They have ceased asking the scientific question ‘How much is the effect?’ And they have therefore ceased being interested in the pragmatic questions that follow: ‘What Difference Does the Effect Make?’ and ‘Who Cares?’ They have become, as we put it, ‘sizeless.’” [pp. 4-5]

“[S]tatistical significance is a philosophy of mere existence. …. It concerns itself only with one kind of probability of a (allegedly) randomly sampled event – the so-called exact p-value or Student’s t – and not with other kinds of sampling probability, such as the “power of the test … [or] nonsampling sources of error, such as ... measurement error … [or] experimental error and sample selection bias.” [pp. 7-8]

Ziliak and McCloskey in The Cult of Statistical Significance, 2008

Submitted by Margaret Cibes


“[A researcher] has been funded in part by the U.S. government’s Monty Python-esquely named Office of Research Integrity’s Research on Research Integrity Program.[1]

David H. Freedman, in Wrong, 2010, p. 106

Submitted by Margaret Cibes

“[A] political science professor at Southern Connecticut State University … has developed a mathematical formula to assess presidential success. … Her model, she said, ‘explains 50 percent in the variance in the quality of the president, which is awfully good ….’"

in “Southern Professor Links Presidential Success To Prior Experience”
The Hartford Courant, January 2, 2012

Submitted by Margaret Cibes

"When I was in NYC I went to this party by group of Japanese bio-scientists. There, one guy told me about how the biggest pharmaceutical company in Japan did their statistics. They ran 100 different tests and reported the most significant one. (This was in 2006 and he said they stopped doing this few years back so they were doing this until pretty recently…) I’m not sure if this was 100 multiple comparison or 100 different kinds of test but I’m sure they wouldn’t want to disclose their data…"

posted by a colleague on Andrew Gelman's blog, 24 December 2011

Submitted by Paul Alper


Why People Believe Weird Things: Pseudo-Science, Superstition, and Other Confusions of Our Time
by Michael Shermer, MIF Books, 1997, p. 54

Shermer is founding publisher of Skeptic magazine and a Scientific American columnist. This book contains his list of “Twenty-five Fallacies That Lead Us to Believe Weird Things.” The fallacies are not new, but are well illustrated by many interesting historic and contemporary stories.

See Shermer's 13-minute TED Talk, including a demonstration of a 900 dollar “dowser” designed to find marijuana in kids’ lockers. Shermer states:

Science is not a thing, it’s a verb. It’s a way of thinking about things. It’s a way of looking for natural explanations for all phenomena.


Shermer states:

[M]ost people have a very poor understanding of the laws of probability. …. The probability that two people in a room of thirty people will have the same birthday is .71.

Ignoring issues such as leap years or twins, and assuming a uniform distribution of real-life birthdays, do you agree with the probability as stated – or could you modify the statement to make it more accurate?

Submitted by Margaret Cibes

Improving IQs

“Ways to Inflate Your IQ”
by Sue Shellenbarger, The Wall Street Journal, November 29, 2011

This is a report about a potpourri of research projects that claim to show that an IQ can change over time. Some sample IQ test questions, with suggestions about how to increase an IQ, are also provided. There is no discussion about what an IQ test measures.

In the latest study, 33 British students were given IQ tests and brain scans at ages 12 to 16 and again about four years later …; 9% of the students showed a significant change of 15 points or more in IQ scores.
On a scale where 90 to 110 is considered average, one student's IQ rose 21 points to 128 from 107, lifting the student from the 68th percentile to the 97th compared with others the same age, [according to a co-author] of the study, published last month in Nature.


  1. We are told that 33 British students took an IQ test twice. It is conceivable that there were additional students who participated in the first test administration but not the second. Would it be helpful, for inference purposes, to have information about any such students, such as reasons for their non-participation in the second administration?
  2. When people are IQ tested over time, do you think that they are given the same test (or a parallel version), or might a subsequent test include different skills/concepts appropriate for an older group?
  3. On one commonly used IQ test, scores are standardized to mean 100 and standard deviation 15. This is consistent with the claim that a 107 score rising to 128 corresponds to a 68th percentile score rising to the 97th. How would you equate the test scores from two administrations if the tests were different, in order to account for the two tests' possibly different difficulty levels? (See “Equating Test Scores”,”by Samuel Livingston, Educational Testing Service, 2004.)

Submitted by Margaret Cibes

Winning the fight against crime by putting your head in the sand

Police Tactic: Keeping Crime Reports Off the Books Al Baker and Joseph Goldstein, New York Times, December 30, 2011.

Police officers are joining just about every other profession in trying to skew the statistics to make themselves look good.

Crime victims in New York sometimes struggle to persuade the police to write down what happened on an official report. The reasons are varied. Police officers are often busy, and few relish paperwork. But in interviews, more than half a dozen police officers, detectives and commanders also cited departmental pressure to keep crime statistics low.

The message about reducing police reports comes in many subtle ways.

Officers sometimes bend to pressure by supervisors to eschew report-taking. “Cops don’t want a bad reputation, and stigma,” one commander said. “They know they have to please the sergeants.”

This pressure comes from even higher up.

The sergeants, in turn, are acting on the wishes of higher-ups to keep crime statistics down, a desire that is usually communicated stealthily, the commander said. As an era of low crime continues, and as 2011 draws to a close with felony numbers running virtually even with last year’s figures, any new felony is a significant event in a precinct and a source of consternation to commanders.

Part of the problem is the broad discretion that police officers apply.

In one case, Sandra Ung, 37, went to the Fifth Precinct in Chinatown after her wallet disappeared at a Starbucks. "I had it and then it was gone," she said of the Feb. 23 episode. She said she believed her wallet had been stolen, but could not prove it. She assumed the police had recorded it as pickpocketing, but when she retrieved a copy of the report days later, she saw it was recorded not as a crime, but as lost property that had gone "missing in an unknown manner."

The report noted that the victim had not felt anything that would indicate the actions of a pickpocket. But interestingly, the standards for categorizing the event as a crime were not this strict.

The guidelines focused on the very words that the police used to discount her suspicions: "The victim does not need to have witnessed, felt or otherwise been aware of being bumped or jostled in order to properly record the occurrence as grand larceny."


  1. The report discusses ways in which the underreporting of crimes could be measured. Discuss those approaches and suggest any additional approaches that could be used to detect the extent of this problem.
  2. Why is the desire to keep crime statistics low a short sighted policy?

Submitted by Steve Simon

Marilyn slips up on a drug testing question

Jerry Grossman wrote to point out an error in a recent "Ask Marilyn" column.

Ask Marilyn: What's the Probability of Being Chosen for a Drug Test?
by Marilyn vos Savant, Parade, 25 December 2011

A reader asks:

I manage a drug-testing program for an organization with 400 employees. Every three months, a random-number generator selects 100 names for testing. Afterward, these names go back into the selection pool. Obviously, the probability of an employee being chosen in one quarter is 25 percent. But what’s the likelihood of being chosen over the course of a year?

Marilyn responds, "The probability remains 25 percent, despite the repeated testing. One might think that as the number of tests grows, the likelihood of being chosen increases, but as long as the size of the pool remains the same, so does the probability." Jerry observes that she seems to be answering the wrong question (What is the chance of being chosen in any particular quarter during the year?), rather than the one the reader intended (What is the chance of being chosen at least once?), for which the answer would be <math>1-(3/4)^4</math>, or about 68 percent. He adds that Marilyn's readers quickly began commenting here.

In another message, Dom Rosa noted the blunder and sent us a link to the following week's column:

Ask Marilyn: Did Marilyn Make a Mistake on the Drug-Testing Question?
by Marilyn vos Savant, Parade, 2 January 2012

Here Marilyn suggests that the original problem may have been ambiguous. Nevertheless, she does print corrections from Jerry and another reader. She also swears off eggnog, and promises a followup on January 22. Check back here for the update.

Impact of large philanthropists on research

“Got Dough? How Billionaires Rule Our Schools”
by Joanne Barkan, Dissent, Winter 2011

The author discusses what she sees as potential problems with the 4 billion dollars in annual funding of educational innovation and research by the Gates, Broad, and Walton families through their foundations. Here is an excerpt about Gates’ funding of malaria research:

[In 2008], the New York Times reported on a memo that it had obtained, written by Dr. Arata Kochi, head of the World Health Organization’s malaria programs, to WHO’s director general. Because the Gates Foundation was funding almost everyone studying malaria, Dr. Arata complained, the cornerstone of scientific research—independent review—was falling apart.

Many of the world’s leading malaria scientists are now “locked up in a ‘cartel’ with their own research funding being linked to those of others within the group,” Dr. Kochi wrote. Because “each has a vested interest to safeguard the work of the others,” he wrote, getting independent reviews of research proposals “is becoming increasingly difficult.”

The director of global health at Gates responded predictably: “We encourage a lot of external review.” But a lot of external review does not solve the problem, which is structural. It warps the work of most philanthropies to some degree but is exponentially dangerous in the case of the Gates Foundation. Again, Frederick Hess in With the Best of Intentions: “…Researchers themselves compete fiercely for the right to evaluate high-profile reform initiatives. Almost without exception, the evaluators are hired by funders or grantees….Most evaluators are selected, at least in part, because they are perceived as being sympathetic to the reform in question.”

Submitted by Margaret Cibes

Scientific misconduct

Disgrace: On Marc Hauser; A case of scientific misconduct at Harvard.
by Charles Gross, The Nation, 21 December 2011 (in the print edition January 9-16, 2012)

This is a lengthy article about Marc Hauser's research fabrication at Harvard. The Boston Globe broke the story in August of 2010; Hauser resigned from Harvard last summer. Hauser's research involved cognition in chimpanzees. As described in the Globe, Hauser's 2002 paper in the journal Cognition was retracted because of unspecified data irregularities, for which Hauser reportedly took responsibility.

Gross is critical of the secrecy surrounding Harvard's internal investigation. He writes:

The procedures and conclusions of the investigation raise many questions. Its methods and results remain secret. Its procedures bore no relation to the due process that is the goal of our judicial system. We have no clear idea of the exact nature of the evidence, of how many studies were examined and if anyone besides the three whistleblowers and Hauser was asked to testify. I was told by one of the whistleblowers that, to this person’s surprise and relief, the committee, which included scientists, did look carefully at evidence, even going so far as to recalculate statistics.

Earlier in the article, we find this ironic note:

In an interview titled “On How We Judge Bad Behavior,” made a few months before the Globe broke the story of Harvard’s investigation and available on YouTube, Hauser discusses psychopaths and suggests that they “know right from wrong but just don’t care.”

Andrew Gelman notes on his blog that he received this comment on the Hauser case from E. J. Wagenmakers:

One of the problems is that the field of social psychology has become very competitive, and high-impact publications are only possible for results that are really surprising. Unfortunately, most surprising hypotheses are wrong. That is, unless you test them against data you’ve created yourself. There is a slippery slope here though; although very few researchers will go as far as to make up their own data, many will “torture the data until they confess”, and forget to mention that the results were obtained by torture….

Submitted by Paul Alper

More scientific misconduct

“Investigation Finds UConn Professor Fabricated Research”
by William Weir and Kathleen Megan, The Hartford Courant, January 12, 2012

After a three-year investigation, the University of Connecticut has stopped all externally funded research in a Health Center researcher’s lab and declined 890K dollars in research grants to the researcher, due to 145 instances of alleged “fabricated research.” The U.S. Office of Research is investigating, after receiving a 60,000-page report from UConn.

One focus of the investigation was a set of images in these research papers representing "Western blots" — an analytical technique used to detect certain proteins in tissue samples. Generally, results of these experiments are represented with a series of bands, one for each experiment conducted.
The investigation found several instances of these images being manipulated: Some were spliced together, some duplicated, some erased. Many bands that had nothing to do with the particular experiments were cut and pasted into the studies. The report states that these kinds of manipulations can be done simply with such software programs as Adobe Photoshop.

The report suggests that “one of the curious aspects” of this case relates to the division of responsibilities of the research team:

Some lab members — even when they were the first authors on the papers — had no role in biochemical analyses or preparing figures. Compartmentalizing the work in such a way, according to the report, would make it harder to trace any fabrication to its origin.

The researcher has denied any knowledge of the alleged manipulation.

Several other recent instances of alleged scientific fraud are briefly described in a separate article[2].

Submitted by Margaret Cibes

Surprising dreidel outcome

A one-in-trillions dreidel game
by By Paul Grondahl, Times Union (Albany, NY), 28 December, 2011

Here is a local news story concerning a remarkable streak in a family game of dreidel, whose rules are described in the article as follows:

The four-sided spinning top features letters of the Hebrew alphabet on each side: nun, gimel, hei and shin. The players put a penny in a pot at the center of the table. Each player took a turn and spun the dreidel. If the nun faced up, the player did nothing. A gimel meant the player got all the pennies in the pot. A hei roll won half the pot. Shin required the player to add a penny to the pot.

To the amazement everyone present, Alfred Lorini compiled a streak of 68 spins that included 56 gimels and zero shins. According to the article, his great-nephew "used a binomial distribution and came up with 1-in-2.25 times 10 to the 22nd power for the order of magnitude."

In fact, this is the binomial probability for obtaining exactly 56 successes (gimels) in 68 trials (spins) with success probability 1/4; this assumes independent spins of a perfectly balanced dreidel. But there are two problems here. First, we should ask for the chance of 56 or more successes. This adjustment does change the order of magnitude of the probability, which becomes 1 in <math>2.09 \times 10^{22}</math>. However, the binomial description is not correct because it allows the non-gimel rolls to be nun, hei or shin. Mr. Lorini's feat was more unusual in that the non-gimels did not include any shins. Thus we really need to consider a multinomial situation, with categories (gimel, shin, neither), for which the probabilities are (1/4, 1/4, 1/2). The chance of 56 or more gimels in 68 rolls, with zero shins, is then calculated as 1 in <math>2.62 \times 10^{24}</math>.

In an effort to describe the order of magnitude of the (originally reported) answer, the article reports that the figure was 22.5 billion times 1 trillion. Do you think this helps the lay person to understand it? Can you suggest an alternative?

Submitted by Bill Peterson, based on a suggestion from Adam Peterson

A million monkeys

“A million monkeys and Shakespeare”
by Jesse Anderson, Significance, December 2011

The author is a software engineer who has “created a computer program using the Hadoop framework to simulate a million monkeys randomly typing," and he claims that the program has reproduced all of the works of Shakespeare. He gives an overview and technical details about his project at his website[3], both in text and in two videos. He says that he ran out of cloud-computing space and had to revert to working on his home computer.

The Significance article is apparently only available online with an online subscription. (I paid 52 dollars for the four 2011 paper issues; online access seems to require a separate fee.)

Submitted by Margaret Cibes


This story calls to mind the following tongue-in-cheek quotation, which also appeared in the first installment of the Chance News Wiki:

"We've heard that a million monkeys at a million keyboards could produce the Complete Works of Shakespeare; now, thanks to the Internet, we know this is not true."

Robert Wilensky, Professor Emeritus of Electrical Engineering and Computer Science, UC Berkeley

Ethics in economic reports

“Economists Set Rules on Ethics”
by Ben Casselman,The Wall Street Journal, January 9, 2012

Under new rules adopted by the American Economic Association at its annual meeting here last week, economists will have to disclose financial ties and other potential conflicts of interest in papers published in academic journals. .... Under the policy, which will be enacted over the course of the next year, authors submitting papers to academic journals must disclose to the journal's editors all sources of financing for the research and all "significant" financial relationships with groups or individuals with a "financial, ideological or political stake" in the research. The policy defines "significant" as financial support to an author and immediate family members totaling at least $10,000 in the past three years. The journals will then make public what their editors deem "relevant potential conflicts of interest."

[C]riticisms [related to the lack of transparency in financial reporting/analysis] were made most prominently in the 2010 film "Inside Job," which won an Academy Award for best documentary in 2011. The movie highlighted prominent economists' ties to companies and governments that later collapsed in the financial crisis.

The film “Inside Job” is available on Netflix for online or at-home-TV viewing. See a brief excerpt from the film “Inside Job” on YouTube, “Inside Job Clip”. It shows the interview with a Columbia economist who is an advocate for de-regulation of the financial markets, and includes a question to him about why the title of one of his 2006 books (#17 on his CV) has had its title changed from Financial Stability in Iceland to, currently, Financial Instability in Iceland; the professor suggests that this was a typo.


The WSJ article states “[A Harvard economist] drew a distinction between fields like medicine, where researchers can suppress data that don't support their or their sponsors' desired outcomes, and economics, where most research is based on publicly available information.” What do you think about such a distinction?

Submitted by Margaret Cibes

Polling content problems

“When Polls Turn Up the Wrong Number”
Blog by Carl Bialik, The Wall Street Journal, January 6, 2012

Carl Bialik (the WSJ Numbers Guy) writes about political scientists’ explanations for poll results that are less than helpful. Here are a few explanations from academics; while not all are new to Chance readers, they do remind us and our students of some important polling issues:

  • “Most [people] are not critical consumers of knowledge and rarely seek out multiple sources of information to verify statistical findings.”
  • “People have enough trouble estimating numbers they control, let alone ones that may never personally affect them.”
  • “Pollsters shouldn’t be so quick to stop surveys with people who say they don’t plan to vote: 55% of that group ended up voting in the 2008 general election, compared to 87% of those who were almost certain they’d cast a ballot.”[4]
  • “Everyone, no matter how well-educated and well-versed in quantitative thinking, is vulnerable to bad math: One classic study found that Wharton graduate students two decades ago — a decade before the Sept. 11 attacks — were willing to pay more for a travel-insurance policy that covered just terrorism than for one that covered all travel interruptions, including terrorism.”[5]
  • “Most survey respondents aren’t thinking too hard about these questions and there is no consequence to them giving a wrong answer. Would they do better if they thought harder or if they had more at stake?”
  • “Disproportionate media coverage of a certain issue might also spark inflated estimates of its scope.”

A Princeton nueroscientist suggests one way to improve numeracy: “Every policy story should contain an example that is typical, again reflecting true probabilities. ‘MAN LOSES LOTTERY’ — how’s that for a headline?”

And the saddest commentary of all, which has been backed up by some recent behavioral studies: “Even if people improved their statistical literacy and absorbed the correct number, then adjusted their views, any one issue is unlikely to shift their votes …. Even if they knew the right answers, there is little evidence that anyone would change their [sic] votes.”

See Bialik's related article, "Americans Stumble on Math of Big Issues", in The Wall Street Journal, January 7, 2012.

Submitted by Margaret Cibes


Paul Alper wrote to say he had taken Margaret's advice to see the related article. There we read:

"We found that people resisted any attempts to give them accurate information," says James Kuklinski, a political scientist at the University of Illinois. He and colleagues asked Illinois residents for their opinions and factual beliefs on welfare. More than 60% supplied an estimate of the percentage of U.S. families on welfare that was more than double the correct proportion, among other misfires. Those most misinformed were most confident in their estimates, according to the 2000 paper. And a subgroup supplied with the right numbers didn't change their views in a meaningful way.

This is not, Paul notes, a ringing endorsement for education.