# Chance News 24

## Contents

- 1 Quotations
- 2 Forsooths
- 3 A question
- 4 The danger of providing expert witness testimony when you are not an expert
- 5 Unrandomizing a Walk
- 6 A Stirling Approach to the Central Limit Theorem (CLT)
- 7 How to detect voting fiddles
- 8 Why so much medical research is rot
- 9 A curious discrepancy
- 10 A Car Talk puzzle
- 11 Solution to the Car Talk problem

## Quotations

Paul Alper suggested the following quotations from "Sense and Nonsense of Statistical Inference" by Chamont Wang, Marcel Decker, 1993.

As a rule of thumb, often the more computer printouts empirical studies produce, the less intellectual content they have." [Page 84]

In brief, the previous design (the two-sample comparison) had three problems: (1) not scientific, (2) not ethical, and (3) not effective. Other than that, everything was fine." [Page 106]

In comparison to engineering measurements, most modern-day psychometric instruments are still in the stone age." [Page 129]

Paul also wrote:

Here is a candidate for the most spectacularly incorrect prediction of the last century. In 1943 at the out-of-town tryouts for the new musical, Oklahoma, Mike Todd the noted showman and later a husband of Elizabeth Taylor, quipped, "No legs, no tits, no chance." Like the British Empire of the 19th century, it is speculated that there are road companies of Oklahoma on which the sun never sets.

## Forsooths

Archaeologists and clergymen in the Holy Land derided claims in a new documentary produced by James Cameron that contradict major Christian tenets, but the Oscar-winning director said the evidence was based on sound statistics.

Stephen Pfann, a biblical scholar at the University of the Holy Land in Jerusalem, said the film's hypothesis holds little weight.

"How possible is it?" Pfann said. "On a scale of one through 10 _ 10 being completely possible _ it's probably a one, maybe a one and a half."

Scholars, clergy slam Jesus documentary

Assocated Press Feb. 26, 2007

Marshall Thompson

Steve Simon provided the following Forsooth:

Mr. Romney started Bain Capital in 1984 with an initial fund of about $40 million. During the fourteen years he ran it, Bain Capital's investments reportedly earned an annual rate of return of over 100 percent, potentially turning an initial investment of $1 million dollars into more than $14 million by the time he left in 1998.The New York Times

In Romney's 2008 Bid, Wallet Opens to the Right

11 March 2007

These Forsooths are from the Feb. 2007 RSS News.

The car poplation went up 10 per cent over the 1997-2004 period, while daily car trips more than doubled, rising 23 percent.The Straits Times (Singapore)

24 November 2006

Online banking fraud up 8000%

The UK has seen an 8000% increase in fake internet banking scams in the past two years, the government's financial watchdog has warned...The amount stolen is still relatively small but it is set to go up by 90% for the second year running.

BBC Ceefax

213 December 2006

## A question

If it's zero degrees today and it's supposed to be twice as cold tomorrow, how cold is it going to be?Eugene Demidenko

## The danger of providing expert witness testimony when you are not an expert

Expert witness guidance: Likely implications of Professor Sir Roy Meadow’s GMC case

Sir Roy Meadow is an expert on child abuse, having published a landmark paper in 1977 on a condition known as Munchausen Syndrome by Proxy. An observation of his

one sudden infant death in a family is a tragedy, two is suspicious and three is murder, unless proven otherwise.

became knows as "Meadow's Law".

In testimony at the trial of a woman, Sally Clark, who had two children who died from SIDS, Sir Roy tried to quantify this statement by arguing that the chances of observing two SIDS deaths would be 73 million to one. He arrived at this figure by squaring the probability of one SIDS death (8.5 thousand to one). Sally Clark was convicted of murder, but her conviction was overturned on appeal.

Dr. Meadow's testimony came under criticism, because squaring the probability only makes sense under independence. If there are genetic or environmental risk factors that influence SIDS deaths, then the probability estimate could be wrong. Not just wrong, but spectacularly wrong. It's an error that (hopefully) no statistician would make, but Dr. Meadow is not a statistician.

The General Medical Council reviewed this case and found Dr. Meadow to be guilty of serious professional misconduct and erased his name from the medical register. This action, which would prevent Dr. Meadow from practicing medicine, was still largely symbolic since Dr. Meadow is currently retired from medical practice.

Dr. Meadows appealed this decision in the British Courts which ruled that the actions of the General Medical Council should be overturned because expert witnesses will refuse to testify if they believe that their testimony, if shown later to be invalid, could lead to sanctions.

Submitted by Steve Simon.

### Questions

1. What is the proper course of action if an expert witness is asked questions outside his/her area of expertise?

2. Do Sir Meadow's actions constitute an honest mistake or serious professional misconduct?

3. Do expert witnesses need immunity from recrimination if their testimony is found to be in error?

### Additional readings

(1) Royal Statistical Society, [RSS%20Statement%20regarding%20statistical%20issues%20in%20the%20Sally%20Clark%20case,%20October%2023rd%202001.pdf Statement regarding statiscal issues in the Sally Clark case,

Press Release, October 23, 201.

(2) Ray Hill, Multiple suddent infant deaths--coicidence or beyond coicidence?, *Paediatric and Perinatal Epidemiology*, 18 (2004), 320-326.

## Unrandomizing a Walk

America is a wealthy country. At times too wealthy. According to a New York Times article of February 10, 2007 by Benedict Carey, "James S. McDonnell, a founder of the McDonnell Douglas Corporation" donated "more that $10 million over the years" to Robert G. Jahn of Princeton University whose Princeton Engineering Anomalies Research Laboratory [PEAR] "has conducted studies on extrasensory perception and telekinesis." For 28 years Jahn has employed random number generators and other devices in order to show that the output could be influenced merely by thinking.

More specifically, according to the article, "Analyzing data from such trials, the PEAR team concluded that people could alter the behavior of these machines very slightly, changing about 2 or 3 flips out of 10,000." So to speak, unrandomizing a walk. From this meager beginning it supposedly follows that "thought could bring about changes in many other areas of life--helping to heal disease, for instance, in oneself and others." The beginning is meager because there is a suspicion that random number generators aren't all that accurate. An even larger suspicion that pervades all such ESP investigations is that the data may be unconsciously or otherwise manipulated.

In his 2000 book, Voodoo Science: The Road from Foolishness to Fraud, Robert L. Park wrote of Jahn's results, "a large number of trials with a tiny statistical deviation from pure chance, and apparently no way to increase the strength of the effect." Park suggested that "Why not just use your psychokinetic powers to deflect a microbalance?...The reason, of course, is that the microbalance stubbornly refuses to budge. That may explain why statistical studies are so popular in parapsychological research: they introduce all sorts of opportunities for uncertainty and error."

Jahn is finally packing it in, realizing that no respectable journal will publish his assertions; "If people don't believe us after the results we've produced, then they never will." According to the NYT, "One editor famously told Dr. Jahn that he would consider a paper 'if you can telepathically communicate it to me.'"

### Discussion

1. According to the NYT, "Princeton made no official comment" concerning the closing of PEAR. Speculate why Princeton issued no statement.

2. According to the NYT, "The culture of science at its purest, is one of freedom in which any idea can be tested regardless of how far-fetched it might seem." Why then does Park say, "Science has a substantial amount of credibility, but this is the kind of thing that squanders it."

3. A theme running throughout Park's book is that even though the scientific community is overwhelming united regarding ESP, astrology, divining rods, the second law of thermodynamic, etc., the media often presents an issue as if there were a legitimate disagreement among equals. Relate this to how global warming is portrayed.

Submitted by Paul Alper

## A Stirling Approach to the Central Limit Theorem (CLT)

A Stirling Approach to the Central Limit Theorem (CLT) By Bill Roach and Robert Kerchner Washburn University Topeka, KS 66621

Key words:
Central Limit Theorem, LaPlace, Stirling’s Approximation, error propagation, Excel, De Moivre

Abstract: Many applied statistics courses do not review a proof of the Central Limit Theorem; they rely on simulations like Galton’s Quincunx, and / or sampling distributions to acquaint the students with the Bell Curve. The Bell Curve is there, but students are left asking: 1. where did the π come from? 2. how did a power function based on e get into the formula? The short answer to that question is “Stirling’s Formula for n!.” Looking at the accuracy of Stirling’s Formula can give students some useful insights into the DeMoivre-LaPlace (binomial distribution) version of the Central Limit Theorem (CLT).

## How to detect voting fiddles

Election forensics, The Economist, Feb 22nd 2007.

Walter Mebane and his team at Cornell University claim to have devised a new method to detect fraud by using statistics. It is based on a mathematical curiosity known as Benford's law.

This law states that in certain long lists of numbers, such as tables of logarithms or the lengths of rivers, the first digit of each number is unevenly distributed between one and nine. Instead, there are far more numbers beginning with one—about a third of the total—and far fewer starting with nine. For example, a 2km stream is twice as long as a 1km stream; by contrast, a 10km stream is only 11% longer than a 9km stream. So you will find more streams measuring between 1km and 2km than between 9km and 10km.

(This topic has been discussed previously in Chance News: Following Benford's law, or looking out for No. 1, in Chance News 7.07, Benford's distribution in the Dow-Jones, in Chance News 6.01 and Chance News 4.10. However, those articles deal with the first significant digit only.)

Dr Mebane is concerned with the second, rather than the first, digit of lists of election results from different precincts of a constituency, where he also observes a non-uniform distribution of possible digits. The effect is far more subtle, with zero occurring about 12% of the time and nine turning up on 9% of occasions.

A quoted example concerns an analysis of the last three elections in Bangladesh.

The 1991 election showed no strange results. For the 1996 election some 2% of results were problematic. And fully 9% of the results in 2001 failed the test. The 2001 election was fiercely contested. Yet monitors from the Carter Centre and the European Union found the election to be acceptably, if not entirely, free and fair. Tests like Dr Mebane's one could provide monitors with quantitative estimates of exactly how free and fair an election has been, on which to base their qualitative judgment of whether that is indeed acceptable.

It is a very simple but not foolproof test for fraud, that can be easily applied to data. The author admits that his method sometimes fails to detect a discrepancy in a vote that is known to have been problematic, and occasionally detects fiddling where there was none.

The author claims to have developed a mathematical model that explains the distribution of the second digits, putting what might appear to be a statistical oddity on a more solid footing.

### Questions

- Why do you think the author foccuses on the second digit rather than the first digit, when anomolies in the distribution of the second digit, relative to the first, are more difficult to detect?
- Do you think similar results might hold for the third or fourth significant digits?

### Further reading

- Election Forensics: Vote Counts and Benford's Law, Walter Mebane Jr., 2006.
- Benford's Law and Zipf's Law.
- SigFigDistbGen, repeatedly mutiplies a number of your choice by a factor of your choice and plots the significant figure.
- Interestingly, the first digit of the sequence 2^n (i.e. 2, 4, 8, 16, 32, 64, ... yielding the digits 2, 4, 8, 1, 3, 6, ...) follows Benford's Law. Can you offer an intuitive explanation for this?

Submitted by John Gavin.

## Why so much medical research is rot

Signs of the times,
The Economist, Feb 22nd 2007.

Peter Austin of the Institute for Clinical Evaluative Sciences in Toronto explains why so many health claims that look important when they are first made are not substantiated in later studies.

The confusion arises because each result is tested separately to see how likely, in statistical terms, it was to have happened by chance. If that likelihood is below a certain threshold, typically 5%, then the convention is that an effect is 'real'. And that is fine if only one hypothesis is being tested. But if, say, 20 are being tested at the same time, then on average one of them will be accepted as provisionally true, even though it is not.

The confusion arises because each result is tested separately to see how likely, in statistical terms, it was to have happened by chance. If that likelihood is below a certain threshold, typically 5%, then the convention is that an effect is 'real'. And that is fine if only one hypothesis is being tested. But if, say, 20 are being tested at the same time, then on average one of them will be accepted as provisionally true, even though it is not.

The author warns that many researchers looking for risk factors for diseases are not aware that they need to modify their statistics when they test multiple hypotheses. Consequently, a lot of observational health studies cannot be reproduced by other researchers, specifically those that go trawling through databases rather than relying on controlled experiments.

Previous work by Dr Ioannidis (discussed in Chance News Sept-Oct 2005), on six highly cited observational studies, showed that conclusions from five of them were later refuted. In new work, he looked systematically at the causes of bias in such research and reconfirmed that the results of observational studies are likely to be completely correct only 20% of the time. If such a study tests many hypotheses, the likelihood its conclusions are correct may drop as low as one in 1,000—and studies that appear to find larger effects are likely, in fact, simply to have more bias.

The Economist article finishes with a warning:

So, the next time a newspaper headline declares that something is bad for you, read the small print. If the scientists used the wrong statistical method, you may do just as well believing your horoscope.

### Further reading

- Just how reliable are scientific papers? Chance News, September-October 2005.
- Most Published Research Findings False, Evisa, Sept 2005.

Submitted by John Gavin.

## A curious discrepancy

We received the following comments from Norton Starr.

A curious discrepancy between two claims for the same datum caught my eye recently. The conflict in data appeared in the following places: William Holstein's "Saturday Interview", B3, NYTimes, Feb. 17 and in the January AARP Bulletin, p.12. ("Hospital-acquired infection is the fourth leading cause of death in the United States" and "Nationwide, hospital infections are the eighth-leading cause of death" , respectively.) When I saw differing claims for the same cause of death, within days of each other, I wondered how they could have arisen.

The New York Times article claim was traced back (thanks to someone at Cardinal Health) to the following source (see numbered item 1):

The executive editor of AARP Bulletin referred us to the work of Wenzel and Edmond, which we found here. We read:

This last says "Nosocomial bloodstream infections are a leading cause of death in the United States. If we assume a nosocomial infection rate of 5%, of which 10% are bloodstream infections, and an attributable mortality rate of 15%, bloodstream infections would represent the eighth leading cause of death in the United States." I.e. bloodstream infections are ranked eighth. Those strike me as somewhat specialized infections, and I note that pneumonia and influenza are ranked fourth. Perhaps a major proportion of these latter two are hospital -derived infections and thus serve as source for the Times claim.

It's interesting that ranks differing by a factor of two would be described in the same terms. This raises various questions: Were the data from sources at different years and thus are both correct? If so, that would suggest a variability sufficiently large that making public (or private) policy to deal with the hazard becomes problematic. If not, then how is the discrepancy explained? Noting that Wenzel and Edmond are the likely source and observing that the word "bloodstream" was omitted, explains the discrepancy, and seems to resolve the matter. Such oversights are common in the literature, and this is a good example because the two ranks were published so close in time to each other. For me the primary lesson is that in using data for public policy, it's always good to check the original source.

## A Car Talk puzzle

Here is the Car Talk puzzle for January 29, 2007

RAY: This puzzler is mathematical in nature. Imagine if you will, three gentleman, Mr. Black, Mr. Brown and Mr. White, who so detest each other that they decided to resolve their differences with pistols. It's kind of like a duel—only a three-way duel. And unlike the gunfights of the old West, where the participants would simultaneously draw their guns and shoot at each other, these three gentlemen have come up with a rather more civilized approach.

Mr. White is the worst shot of the three and hits his target one time out of three. Mr. Brown is twice as good and hits his target two times out of three. Mr. Black is deadly. He never misses. Whomever he shoots at is a goner.

To even the odds a bit, Mr. White is given first shot. Mr. Brown is next, if he's still alive. He's followed by Mr. Black, if he's still alive.

They will continue shooting like this, in this order, until two of them are dead.

Here's the question: Mr. White is the first shooter. Remember, he's the worst shot. At whom should he aim his first shot to maximize his chances of surviving?

You should try to solve this puzzle before looking at the answer.

## Solution to the Car Talk problem

The solution given by the Car Talk boys has the right idea but they struggle a bit with the mathematics. Also the problem is not well defined. However this is a well-known problem. It is problem 20 in Mosteller's famous book "Fifty Challenging Probability Problems". Here is how Mosteller would describe the Car Talk duel.

A, B, and C are to fight a three-cornered pistol duel. All know that A's chance of hitting his target is 1/3, B's is 2/3, and C never misses. They are to fire at their choice of target in succession in the order A, B, C, cyclically (but a hit man loses further turns and is no longer shot at) until only one dueler is left. What should A’s strategy be?

Both Mosteller and the Car Talk boys suggest that A should make his first shot at the sky and then the duelers should shoot at the most skilled of the duelers still alive. Of course, suggesting that A's first shot should be at the sky is counterintuitive and this is what makes the problem interesting. Neither Mosteller nor the Car Talk boys, when describing the game, say that shooting at the sky is an option; but Mosteller writes:

In discussing this with Thomas Lehrer, I raised the question whether that (shooting at the sky) was an honorable solution under the code duello. Lehrer replied that the honor involved in three-cornered duels has never been established and so we are on safe ground to allow A a deliberate miss.

If we assume that the duelers use the strategy suggested by Mosteller, we can compute the probability that each of the three duelers is the lone survivor. Under his strategy, the possible paths the duel could take are shown in the following tree diagram.

The nodes give the remaining duelers. The bar indicates the dueler who shoots next. The branches have the probabilities for the possible outcomes of a shot. All but one of these probabilities is obvious from skill of the dueler. The one that is not obvious is when A has only B to shoot at. In this case, we have a two-person duel with A shooting first and then the two duelers alternating shots until one is killed. Let p be the probability that A survives and q that B survives this two person duel. Then A survives if he hits B with his first shot (probability 1/3). If they both miss on their first shots (probability 2/3*1/3 = 2/9). A will survive with probability p. Thus p = 1/3 + 2/9p. Solving for p we have p = 3/7 and so q = 4/7.

Now from the tree diagram we can calculate the probability that each dueler survives. Note that there is only one branch resulting in C surviving. He survives with probability 1/3*2/3 = 2/9 = .222. There is also only one branch where B survives, so he survives with probability 2/3*4/7 = 8/21 = .381. So A wins with probability 1-8/21 - 2/9 = 25/63 = .397.

From this we see that the worst shooter has the best probability of surviving, the second best shooter has the second best chance of surviving and the best shooter has the least chance of surviving. Thus we have survival of the weakest!

A three-person duel is also called a truel. Truels have been studied in many different forms depending on the rules for shooting. Applications to other fields including games, voting and government policies have been discussed. In a paper on truels, Donald Knuth remarks:

Note that in some of these cases the weakest player has the best chance, while the strongest player has the worst, even when he is a "sure shot"! This has an obvious moral for international politics, since you may decrease your chances for survival when you increase your firepower.

Truels also provide a good example of a Nash Equilibrium. In a game between two or more players a set of strategies, with the property that no player can benefit by changing his strategy while the other players keep their strategies unchanged, is called a Nash Equilibrium. Nash received his Nobel prize for his proof that a finite game with two or more players has such an equilibrium.

Truels of the type we have considered are called "sequential firing" truels. It has been shown for these truels there are only two possible Nash Equilibriums. One requires that duelers always shoot at the most skillful dueler still alive. Call this type 1 The other requires that the worst shooter shoot to the sky for his first shot and from then on duelers should shoot at the best shooter still alive. Call this type 2.

Which Nash Equilibrium applies depends only on the skill probabilities. The strategy we described for the Car Talk example is a Nash Equilibrium of type 2. Recall that for this example the hit probabilities were a = 1/3, b = 2/3 and c = 1. If we change these probabilities to a = .2, b = .3, c = 1 the Nash Equilibrium is of type 1.

For further information about Truels we recommend "Distribution of winners in truel games", by R. Toral and P. Amengual.

### Questions

(1) As the tree diagram suggests we could also find the probability that each dueler survives by forming using a Markov Chain with states the nodes on the tree. What would be the transition matrix be?

(2) For the Car Talk brothers example, dueler A does better by shooting in the sky than by shooting at dueler C. Show that this is not the case for the example with a = .2, b = .3, c = 1.

Submitted by Laurie Snell with assistance from Jeanne Albert.