Chance News 57

Quotations

An undefined problem has an infinite number of solutions.
Robert A. Humphrey

In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the "Old Oolitic Silurian Period," just a million years ago next November, the Lower Mississippi River was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact."

Mark Twain, Life on the Mississippi (Chapter 17), 1883

Forsooths

According to a November 4, 2009, Wall Street Journal article, “Crisis Compels Economists To Reach for New Paradigm”, University of Chicago economist Robert Lucas stated in 2003:

[The] central problem of depression-prevention has been solved ... for many decades.

A 2009 blogger responded [1] to the article:

These guys build a Rube Goldberg machine, then come back to the brilliant decision that maybe common sense is best after all. That deserves a Nobel prize, minimum.

Submitted by Margaret Cibes

Vampirical

The following quotation can be found here in an article by Gelman and Weakliem entitled, "Of beauty, sex and power: Statistical challenges in estimating small effects":

This ability of the theory to explain findings in any direction is also pointed out by Freese (2007), who describes this sort of argument as "more 'vampirical' than 'empirical'--unable to be killed by mere evidence."

Gelman and Weakliem are criticizing research which putatively detects an effect merely because statistical significance is obtained on either side of zero or, in the case of ratio of females to males, 50%. In particular, they contest the results of studies which claim that “beautiful parents have more daughters, violent men have more sons and other sex-related patterns.” They also analyze so-called Type M (magnitude) errors and Type S (sign) errors.

This is a Type M (magnitude) error: the study is constructed in such a way that any statistically-significant finding will almost certainly be a huge overestimate of the true effect. In addition there will be Type S (sign) errors, in which the estimate will be in the opposite direction as the true effect.

Discussion

1. As a long-term research project, determine via literature and art how the notion of “beautiful” has changed through the ages and across cultures.

2. The imbalance between baby daughters and baby sons produced by beautiful people somehow went from the original article’s (not statistically significant) 4.7% to 8% when dealing with the largest comparison (the most beautiful parents on a scale of 1 to 5) to 26% and finally to 36% via a typo in the New York Times.

3. The authors, based on their analysis, say “There is no compelling evidence that “Beautiful parents produce more daughters.” Nevertheless, why did the original paper have so much appeal?

4. As a check, the authors used People magazine’s “list of the fifty most beautiful people” from 1995 to 2000 to find the offsprings. There were “157 girls out of 329 children, or 47.7% girls (with a standard error 2.8%).” Instead of more females, fewer were produced.

5. The authors note “the structure of scientific publication and media attention seem to have a biasing effect on social science research.” Explain what they mean by a “biasing effect.”

Submitted by Paul Alper for Halloween.

How anyone can detect election fraud

Why Russians Ignore Ballot Fraud Clifford J. Levy, The New York Times, October 24, 2009.

Russian Election Fraud? Steven D. Levitt, Freakonomics Blog, The New York Times, April 16, 2008.

All it takes is a bit of common sense and a careful review of the data to expose election fraud, at least in Russia.

Soon after polls closed in regional elections this month, a blogger who refers to himself as Uborshizzza huddled away in his Moscow apartment and began dicing up the results on his computer. It took him only a few hours to detect what he saw as a pattern of unabashed ballot-stuffing: how else was it possible that in districts with suspiciously high turnouts in this city, Vladimir V. Putin’s party received heaps of votes?

Here's a specific example.

Overall turnout was 18 percent in one Moscow district, and United Russia garnered 33 percent. In an adjacent district, turnout was 94 percent, and the party got 78 percent.

This was done by a statistician in his spare time, with access only to publicly available records.

Uborshizzza, who by day is a 50-year-old medical statistician named Andrei N. Gerasimov, sketched charts to accompany his conclusions and posted a report on his blog. It spread on the Russian Internet, along with similar findings by a small band of amateur sleuths, numbers junkies and assorted other muckrakers.

A similar study of open election records in 2008 also yielded obvious evidence of fraud.

Analyzing official returns on the Central Elections Committee Web site, blogger Sergei Shpilkin has concluded that a disproportionate number of polling stations nationwide reported round numbers — that is, numbers ending in zero and five — both for voter turnout and for Medvedev’s percentage of the vote.

It wasn't just any numbers though, but the numbers on the high end of the distribution.

In most elections, one would expect turnout and returns to follow a normal, or Gaussian, distribution — meaning that a chart of the number of polling stations reporting a certain turnout or percentage of votes for a candidate would be shaped like a bell curve, with the top of the bell representing the average, median, and most popular value. But according to Shpilkin’s analysis, which he published on his LiveJournal blog, podmoskovnik.livejournal.com, the distribution both for turnout and Medvedev’s percentage looks normal only until it hits 60 percent. After that, it looks like sharks’ teeth. The spikes on multiples of five indicate a much greater number of polling stations reporting a specific turnout than a normal distribution would predict.

Sadly, though, the reaction of the Russian people has been a collective shrug.

There was none of the sort of outrage on the streets that occurred in Iran in June, when backers of the incumbent president, Mahmoud Ahmadinejad, were accused of rigging the election for him. Nor the international clamor that greeted the voting in Afghanistan, which last week was deemed so tainted that President Hamid Karzai was forced into a runoff. The apparent brazenness of the fraud and the absence of a spirited reaction says a lot about the deep apathy in Russia, where people grew disillusioned with politics under Communism and have seen little reason to alter their view.

This disillusionment is easily demonstrated in public polling.

Opinion polls ... showed that 94 percent of respondents believed that they could not influence events in Russia. According to another, 62 percent did not think that elections reflect the people’s will.

Submitted by Steve Simon

Questions

1. Compare the reaction of the Russians to these results to the reactions in the United States to the anomalously high votes for Patrick Buchanan in Palm Beach County during the 2000 election. What explains the difference?

2. What other measures of publicly available election records might be used to detect fraud?

Vaccine effectiveness

“Does the Vaccine Matter?”
by Shannon Brownlee and Jeanne Lenzer, The Atlantic, November 2009

This is a very long and detailed article about influenza in particular, vaccines in general, and related health and economic issues, including some historical information. Its focus is on skepticism in the biomedical community about vaccine effectiveness claims.

Since flu is seasonal and is more likely to “contribute to death” than to “kill people directly,” “researchers studying the impact of flu vaccination typically look at deaths from all causes during flu season, and compare the vaccinated and unvaccinated populations.”

Studies have found that “people who get a flu shot in the fall are about half as likely to die that winter—from any cause—as people who do not.” So people are advised to get vaccinated.

When researchers … included all deaths from illnesses that flu aggravates, like lung disease or chronic heart failure, they found that flu accounts for, at most, 10 percent of winter deaths among the elderly. So how could flu vaccine possibly reduce total deaths by half? [One researcher] says: “For a vaccine to reduce mortality by 50 percent and up to 90 percent in some studies means it has to prevent deaths not just from influenza, but also from falls, fires, heart disease, strokes, and car accidents. That’s not a vaccine, that’s a miracle.”

The 50-percent estimate is based on “cohort studies” of vaccinated versus unvaccinated people, studies which are “notoriously prone to bias,” due to “confounding factors … such as education, lifestyle, income, etc..

When a medical investigator in Seattle started to question the 50-percent estimate:

People told me, “No good can come of [asking] this.” …. “Potentially a lot of bad could happen” for me professionally by raising any criticism that might dissuade people from getting vaccinated, because of course, “We know that vaccine works.”

In 2004 she and her colleagues began an investigation of whether “on average, people who get vaccinated are simply healthier than those who don’t, and thus less liable to die over the short term” (the “healthy user” effect). Based on 8 years of medical data on more than 72,000 people age 65-plus, they found:

[O]utside of flu season [author’s emphasis], the baseline risk of death among people who did not get vaccinated was approximately 60 percent higher than among those who did.”

This suggested to the researchers that “the vaccine itself might not reduce mortality at all.”

What was the reaction in the scientific community?

The results were also so unexpected that many experts simply refused to believe them. [Her] papers were turned down for publication in the top-ranked medical journals. One flu expert who reviewed her studies for the Journal of the American Medical Association wrote, “To accept these results would be to say that the earth is flat!” When the papers were finally published in 2006, in the less prominent International Journal of Epidemiology, they were largely ignored by doctors and public-health officials. “The answer I got,” says [the researcher], “was not the right answer.”

A London-trained epidemiologist is so outspoken on this subject that he has become “something of a pariah” in his scientific community. He has reviewed all of the known studies on the effectiveness of flu vaccines, found them wanting, and recommends placebo-controlled studies. However, there are ethical issues associated with withholding potential relief from sick people or exposing at-risk people to the potentially harmful effects of a vaccine.

Submitted by Margaret Cibes

Game theoretic prediction model?

“Forecast: Self-Serving”
by Nicholas Thompson, The New York Times, November 5, 2009

This is a book review of The Predictioneer’s Game: Using the Logic of Brazen Self-Interest to See and Shape the Future [2], by Bueno de Mesquitar, NYU politics professor/Hoover Institution fellow/consultant.

According to the reviewer, Bueno de Mesquitar uses game theory to “model human behavior, divine the future and improve incentive systems … based on the premise that people are selfish.”

De Mesquitar believes that Mother Teresa’s incentive for good works was a desire for a heavenly reward no different from the incentive of global terrorists, and that Belgian King Leopold II’s incentive to behave more kindly at home in Belgium than in the Congo was a desire to keep his home environment more peaceful.

De Mesquitar hopes to “engineer better behavior” by use of his “Policon” analysis system.

His simulations rely on four factors: who has a stake; what each of these people wants; how much they care; and how much influence they have on others. He surveys experts on the topic, assigns numerical values to the four factors, plugs the data into a computer and waits for his software to spit out the future. ….
In a legal dispute involving a corporate client … and the United States attorney’s office, he gave all the possible outcomes a score on a scale from zero (one misdemeanor count) to 100 (multiple felony charges …). He then identified the crucial players in the game … and numerically scored their desired outcomes, their influence and their adamancy. His client entered the talks prepared to end at a position of around 60 …. But the modeling showed that negotiations … would end with … [a] final agreement … closer to 80 …. After running a long series of simulations, [he] came up with a new strategy. …. His model said that this strategy would lead to the case’s resolving at a point closer to 40 on his scale — which is indeed, he claims, how matters turned out.

Using his “Policon” analyses, de Mesquita claims a “90 percent accuracy rate” in his CIA-declassified predictions, a vague claim according to the reviewer.

De Mesquitar ends the book with some bold predictions, and the reviewer concludes:

[I]t’s hard not to feel the same sort of skepticism about the author that he feels toward Mother Teresa.

(My students enjoyed game theory examples from NYU Politics Professor Steven Brams’ Biblical Games: Game Theory and the Hebrew Bible [3] (current 2002 edition is update from my 1980 edition). The current table of contents and sample pages are also available online [4].)

Submitted by Margaret Cibes

Some recent studies of potential interest

“Pacifiers Tied to Speech Disorders”
by Jeremy Singer-Vine, The Wall Street Journal, November 3, 2009

The author summarizes the results of five recent studies, including some “caveats” to consider before relying on the results for future decision-making.

(a) An observational study of 128 Chilean children [5]:

Result: Preschoolers with speech disorders were three times as likely as other children to have used a pacifier for at least three years … and thrice as likely to have started bottle-feeding before nine months of age.
Caveat: The infants' sucking behaviors were based on parental recollections rather than direct observation. A larger, randomized trial is needed to validate the findings ….

(b) A controlled study of mice [6]

Result: Nicotine patches appear to promote the spread and re-growth of cancer tumors ….
Caveat: Mouse and human cancers can differ significantly.

(c) A controlled study of 391 women [7].

Result: Women who lie down for 15 minutes after receiving artificial insemination appear to have a 50% higher chance of becoming pregnant ….
Caveat: The overall rate of pregnancy in this study was significantly lower than at many fertility centers ….

(d) A study of nearly 32,000 Swedish twins [8].

Result: Genetic factors appear to explain much of the connection between heart disease and hip fractures ….
Caveat: Though the study enrolled many subjects, there were fewer than 400 cases in which an identical twin fractured a hip after his or her sibling was diagnosed with heart disease.

(e) A controlled study of 49 patients [9].

Result: A three-day course of antibiotics was no less effective than the standard seven-day course for helping children recover from tonsillectomy …. Patients on the three-day course returned to a normal diet after an average of 5.7 days, while patients on the seven-day course took 6.0 days on average—a statistically insignificant difference.
Caveat: Enrolling more patients could have revealed significant differences that this small study missed. Pain in this study was not measured directly, but rather by the use of pain relievers.

Discussion

1. Suppose that the study of Chilean children had been based upon a larger, randomized trial, including direct observations instead of parental recollections, and suppose the result had been a strong association between the length of use of pacifiers/bottles and the presence of speech disorders. How would you respond to a claim that increased use of use of pacifiers/bottles causes speech disorders in young children? Can you think of a possible alternate explanation for the association?
2. How many twin cases would you need to examine in order to be more confident of a genetic connection between heart disease and hip fractures?
3. Given the statistically non-significant results of the controlled study of 49 patients with tonsillectomies, would you feel more confident about the results if you found statistically significant differences based upon a controlled study of 4900 patients?

Submitted by Margaret Cibes

Who' Probability and Risk in the News

2abc News, Nov. 3, 2009
John Paulos.

Paulos writes:

Probability and risk increasingly permeate our lives. Like it or not, we must be able to assess the threats and opportunities that face us. Here's a random sampling of half a dozen hypothetical questions (with answers at the end) inspired by a variety of recent news stories.

1. It's impossible to say with any precision what risk the Washington area snipers posed to individuals in suburban Maryland and Virginia, but certainly the likelihood of being attacked was quite small — 13 victims out of about four million people in the affected area over three weeks.

Our psychology, however, leads us to be more afraid of what's unfamiliar, out of our control, dramatic, omnipresent, or is the consequence of malevolence. On all these counts, the snipers were more terrifying than more common risks.

Still, let's consider one of these more common risks. How many traffic fatalities can be expected to occur in any given three-week period in the United States? How many in an area the size of suburban Washington?

2. Early in the sniper case the police arrested a man who owned a white van, a number of rifles, and a manual for snipers. It was thought at the time that there was one sniper and that he owned all these items, so for the purpose of this question let's assume that this turned out to be true.

Given this and other reasonable assumptions, which is higher — a.) the probability that an innocent man would own all these items or b.) the probability that a man who owned all these items would be innocent?

3. The Anaheim Angels and San Francisco Giants were in this year's World Series. The series ends, of course, when one team wins four games.

Is such a series, if played between equally capable opponents, more likely to end in six or seven games?

4. The rules of the series stipulate that team A plays in its home stadium for games 1 and 2 and however many of games 6 and 7 are necessary, whereas team B plays in its home stadium for games 3, 4, and, if necessary, game 5. If the teams are evenly matched, which team is likely to play in its home stadium more frequently?

5. Eleven million people went to the polls recently in Iraq and, the Iraqi news media assure us, 100 percent of them voted for Saddam Hussein for president. Let's just for a moment take this vote seriously and assume that Hussein was so wildly popular that 99 percent of his countrymen were sure to vote for him and that only 1 percent of the voters were undecided. Let's also assume that these latter people were equally likely to vote for or against him.

Given these assumptions, what was the probability of a unanimous 100 percent vote?

6. Politics in a democracy is vastly more complicated than it is under dictatorships. Witness the upcoming elections here.

What is the probability that the Republicans, the Democrats, or neither will take control of the Senate on Nov. 5?

Answer to 1. There are approximately 40,000 auto fatalities annually in this country, so in any given three-week period, there would be about 2,300 fatalities. The area around Washington has a population of about four million, or 4/280 of the population of the U.S., so as a first approximation, we could reasonably guess that 4/280 times 2,300, or about 30 auto fatalities, would occur there during any three-week period. Attention must then be paid to the ways in which this area and its accident rate are atypical.

Answer to 2. The second probability would be vastly higher. To see this, let me make up some illustrative numbers. There are about four million innocent people in the area and, we'll assume, one guilty one. Let's estimate that 10 people (including the guilty one) own all the three of the items mentioned above. The first probability — that an innocent man owns all these items — would be 9/4,000,000 or less than 1 in 400,000. The second probability — that a man owning all three of these items is innocent — would be 9/10. Whatever the actual numbers, these probabilities usually differ substantially. Confusing them is dangerous (to defendants).

Answer to 3. For the World Series to last 6 or 7 games, it must last at least 5 games, at which point one team would be ahead 3 games to 2. If the team that is ahead wins the 6th game, the Series is over in 6 games. If the team that is behind wins the 6th game, the Series goes to 7 games. Since the teams are equally matched, the Series is equally likely to end in 6 or 7 games.

Answer to 4. The solution requires that we use a bit of probability theory. Doing so, we find that, on average, team A will play 2.9375 games at its home stadium and team B 2.875 games at its home stadium. Thus team A is a bit more likely to play at home.

Answer to 5. Even given the absurdly generous assumptions above, there would be 110,000 undecided voters (1 percent of 11 million). The probability of a 100 percent vote is thus equal to the probability of flipping a fair coin 110,000 times and having heads come up each and every time! The probability of this is 2 to the power of minus 110,000, or a 1 preceded by more than 30,000 0's and a decimal point. This would be the cosmic mother of all coincidences!

Answer to 6. As of this writing the Democrats hold a one vote edge in the Senate, and there are a number of races too close to call. Significant consequences will surely flow from small, but unpredictable factors so my prediction won't be ready until Wednesday, Nov. 6.

Medicine by the numbers

Making Health Care Better David Leonhardt, The New York Times, November 3, 2009.

Intermountain Healthcare offers a course taught by Dr. Brent James that is very popular, but also very surprising.

His four-month course is called the Advanced Training Program, and it is a combination of statistical methods and management theory applied to the practice of medicine. “I’ve wanted to go for years,” Janet Porter, the chief operating officer of the Dana-Farber Cancer Institute in Boston, told me later. For anybody interested in improving the quality of health care, she said, the program is the equivalent of Harvard.

One way that Dr. James recommends to improve the quality of healthcare is to remove variation. With a Pulmonologist, Dr. Alan Morris, Dr. James prepared a recommendation for the treatment of acute respiratory distress syndrome (ARDS).

Some of the recommendations were based on solid evidence. Many were educated guesses. The final document ran to 50 pages and was left at the patients’ bedsides in loose-leaf binders. Morris’s colleagues were naturally wary of it. “I thought there wasn’t anybody better in the world at twiddling the knobs than I was,” Jim Orme, a critical-care doctor, told me later, “so I was skeptical that any protocol generated by a group of people could do better.” Morris helped overcome this skepticism in part by inviting his colleagues to depart from the protocol whenever they wanted. He was merely giving them a set of defaults, which, he emphasized, were for the sake of a research trial.
The crucial thing about the protocol was that it reduced the variation in what the doctors did. That, in turn, allowed Morris and James to isolate the aspects of treatment that made a difference. There was no way to do that when the doctors were treating patients in dozens of different ways. James has a provocative way of describing his method to doctors: “Guys, it’s more important that you do it the same way than what you think is the right way.”
While the pulmonologists were working off of the protocol, Intermountain’s computerized records system was tracking patient outcomes. A pulmonology team met each week to talk about the outcomes and to rewrite the protocol when it seemed to be wrong. In the first few months, the team made dozens of changes. Just as the pulmonologists predicted, the initial protocol was deeply flawed. But it seemed to be successful anyway. One widely circulated national study overseen by doctors at Massachusetts General Hospital had found an ARDS survival rate of about 10 percent. For those in Intermountain’s study, the rate was 40 percent.

If this sounds like a recommendation straight from the playbook of W. Edwards Deming, you'd be right.

James peppers his classes with anecdotes about W. Edwards Deming, arguably the original quality guru, and it is easy to see why Deming would be attractive to James. Deming grew up on a farm in Iowa in the early 20th century and majored in electrical engineering at the University of Wyoming. During World War II, he was part of a committee that helped the government make wartime production more efficient. After the war, his statistical methods caught on in Japan, and the Japanese credit him with helping to make their postwar boom possible. The so-called Toyota way stems from Deming’s work. Eventually, the same ideas caught on at General Electric, Intel, Wal-Mart and elsewhere in this country.

The article has an interesting story about a doctor who had an unusual pattern of treatment.

Last summer, the members of the labor-and-delivery committee noticed some worrisome signs about an obstetrician at an Intermountain hospital outside Salt Lake City. His births were taking unusually long on average, and a relatively large number of them were Caesarian sections. So Ware Branch, the head of the labor-and-delivery committee, a fit obstetrician in his 50s, sent the doctor a letter asking him to think about what might be causing the trends. One item on the committee’s September agenda was talking about the doctor’s response.

The doctor responded, as expected, in a defensive manner.

In this case, the obstetrician suggested that Intermountain’s numbers were just not right. Branch and his colleagues were confident of their statistics, and they thought this might be what Janie Wilson, the lead nurse on the committee, called “a little growth opportunity.”

A careful response was important here, as doctors at Intermountain, like at many other hospitals, have a large degree of autonomy.

At the September meeting, Branch distributed his own response to the obstetrician’s response. It was a breezy letter full of doctor bonhomie, and it profusely thanked the obstetrician for taking the time to respond in writing. “You are perfectly right to question the data,” Branch wrote. “We have been found incorrect in numerous cases.” But for all its politeness, Branch’s letter was also pointed. With it, he attached a list of every elective induction the obstetrician had done recently and invited him to identify any that had been incorrectly classified. Branch also enclosed statistical profiles of other, similarly busy obstetricians. They performed fewer C-sections and had shorter delivery times. The letter’s final section included the following:
“Lastly, quality improvement is a process, not an event. In part it works by finding variation and drawing attention to it, as has happened with you and others in this effort. And well-done quality improvement is not punitive; it’s educational. It is also worth noting that those docs determined not to learn never do.”

Questions

1. Review Deming's fourteen principles. Which of these principles are illustrated in the care at Intermountain?

2. The article discusses the tension between the need for judgment (intuition) of an individual doctor compared with the need to follow the best available recommendations. Under what situations should intuition be used, according to this article?

3. Are there limits to the extent to which quality control principles can be applied in a medical setting?

Census 1990 – Name frequency data

The U.S. Census Bureau has apparently received so many requests for name frequency data that it has begun a project to respond to these requests by providing the frequencies of all first names by gender and all last names from the 1990 census.

See “Documentation and Methodology” and  “Names File”.


Note: To get the data into an Excel file, (1) save the web file, (2) open Excel, and (3) open the saved file from Excel .

Submitted by Margaret Cibes