# Chance News 34

## Quotation

One more fagot of these adamantine bandages is the new science of Statistics.

Ralph Waldo Emerson
Fate from The Conduct of Life (1860, rev.1876)

## Forsooth

The following Forsooth is from the February 2008 issue of RSS NEWS.

Twenty-six new cases of the inflammatory lung disease sarcoidosis [were seen amongst rescuers] in the first five years after 9/11. Five or fewer rescuers got sarcoidosis anually before 9/11.
New York Daily News
21 September 2007.

This Forsooth was suggested by Norton Starr.

Strokes have tripled in recent years among middle-aged women in the U.S., an alarming trend doctors blame on the obesity epidemic. Nearly 2-percent of women ages 35 to 54 reported suffering a stroke in the most recent federal health survey, from 1999 to 2004. Only about half a percent did in the previous survey, from 1988 to 1994.

ABCNews Strokes Among Middle-Aged Women Triple

By Marilynn Marchione
AP Medical Writer

New Orleans Feb 20, 2008 (AP)

We put "Strokes Among Middle-Aged Women Triple" in google and we found 24,000 hits including a large number of newspapers but no evidence that anyone had noticed the error in the percentages. We asked the author of the report, Dr. Amytis Towfigh, if she knew who made the mistake. She responded:

The error resulted from the fact that the numbers were 0.6% and 1.8%. The reporters rounded down 0.6 to 0.5 and rounded up 1.8 to 2 but still said "tripled."

In other words the, Associated Press feels that they must avoid such difficult numbers as .6 and 1.8.

The next two forsooths were suggested by Paul Alper

Much of the data on overweight people and obesity are limited, equivocal and compromised.
Patrick Basham and John Luik in BMJ, Volume 336, page 244, 2 February 2008
The adverse effects of obesity on health are well established, serious, and causal.
R.W. Jeffery and N.E. Sherwood in BMJ, Volume 336, page 245, 2 February 2008

I didn't major in math, Huckabee said to the Conservative Political Action Conference meeting, according to the Associated Press. I majored in miracles, and I still believe in them.

## Telomeres Tell A Lot

Conventional wisdom, indeed wisdom of any form, indicates that physical activity, a.k.a. regular exercise, is good for you. In particular, intuition would imply that the risk factors for age-related diseases such as diabetes, cancer, hypertension, obesity and osteoporosis would be reduced if people were engaged in physical activity. To make a direct connection between ageing and physical activity, consider a paper in the Archives of Internal Medicine (Vol.168, No. 2, January 28, 2008), “The Association Between Physical Activity in Leisure Time and Leukocyte Telomere Length” by Cherkas, et al.

“Telomeres consist of tandemly repeated DNA sequences that play an important role in the structure and function of chromosomes.” Leukocyte telomere length (LTL) is a proxy variable for one’s biological age as opposed to one’s chronological age. That is, the longer one’s telomeres, the younger one actually is. Conversely, the shorter the telomeres, the more aged.

This study measured the telomeres of 2401 twins who were put into four mutually exclusive categories of physical activity: “Inactive,” “Light,” “Moderate,” and “Heavy” corresponding to “16 minutes, 36 minutes, 102 minutes and 199 minutes” physical activity per week, respectively. The result after adjusting for “Age, sex, and extraction year” was that the “LTL of the most active subjects (group 4) was an average 200 (SE, 79) nt [nucleotides] longer than that of the inactive subjects (group 1)” producing a p-value of .006. The biological implication is “that the most active subjects had telomeres the same length as sedentary individuals up to 10 years younger, on average. This difference suggests that inactive subjects may be biologically older by 10 years compared with more active subjects.” When more complete information was available concerning BMI (biomass index), smoking and SES (socioeconomic status) this reduced the number of subjects to 1531 from the 2401; the LTL difference increased to 213 nt and the p-value increased to .02. Below are a summary table and Figure 1

http://www.dartmouth.edu/~chance/forwiki/alper1.gif
http://www.dartmouth.edu/~chance/forwiki/alper2.gif

### Discussion

1. The article states, “The results of this study can be extrapolated to other white individuals (men and women) of North European origin.” Find a biologist or a helpful librarian to determine whether it is suspected that non-whites have different telomere lengths and/or have a different distribution. If so, what does this imply about telomere length and ageing?

2. There were about nine times as many women in the study as men. Why might this be a concern?

3. Something important is missing in Figure 1 and its absence serves to magnify the average difference. What is it?

4. The subjects in the study were twins and therefore, attracted extra lay media attention. Six of the ten authors are affiliated with Kings College, London. From the Kings College website, “Comparing the telomere lengths of twins who were raised together but take different amounts of exercise, reduces the effect of genetic and environmental variation and so provides a more powerful test of the hypothesis.” Obtain the article and reference #21 to determine why twins as subjects as opposed to non-twins are sort of beside the point.

5. There was a “discordant twin-pair analysis” performed “as a further confirmation of the larger analysis.” A paired 2-tailed t test for 67 twin pairs, separated by at least a two category difference is displayed in Figure 2. What defect does it share with Figure 1? Why is it even more misleading given that a paired t test is being done?

http://www.dartmouth.edu/~chance/forwiki/alper3.gif

6. The article states, “A limitation of this type of study is that physical activity level was self-reported.” Why might this be a limitation?

7. Assume there is a positive association between LTL and physical activity. Give an alternative explanation to physical activity causing greater telomere length. Give another alternative explanation.

Submitted by Paul Alper

## Modeling of Diabetes

Intuition can be deceiving. Obvious examples: the earth is flat and at the center of the solar system, Saddam must have had nuclear weapons, bootstrapping can't possibly be valid, earth, air, fire, water and that's it. An intuitive medical model of type 2 diabetes, according to an article by Rob Stein in the Washington Post of February 6, 2008, is "that the lower the blood sugar the better, and that lowering blood-sugar levels to normal saves lives." But, the results of the ACCORD (Action to Control Cardiovascular Risk in Diabetes) trial involving 10,251 randomly assigned patients turned out to "inject an element of uncertainty into what has been dogma." In the stronger words of Dr. Richard Grimm Jr. who helped design the study, "very surprising, shocking."

Surprising and shocking because "257 patients receiving the intensive treatment [lowering the blood sugar level to that of a person who did not have diabetes] had died compared to 203 receiving the standard treatment [lowering the blood sugar level to that of the average person with diabetes]." This result "prompted federal health officials to abruptly stop one part of the trial so thousands of the type 2 diabetes patients in the study could be notified and switched to less risky treatment."

### Discussion

Assume that approximately half of the 10,251 patients were in the intensive treatment group and half were in the standard treatment group.

1. Why would the researchers do a one-tail test rather than a two-tail test?

2. Here is a Minitab run for the data given in the article:

 Sample x N p 1 257 5125 0.050146 2 203 5126 0.039602

Difference = p (1) - p (2)

Estimate for difference: 0.0105443

95% upper bound for difference: 0.0172689

Test for difference = 0 (vs < 0): Z = 2.58 P-Value = 0.995

Fisher's exact test: P-Value = 0.996

Why is the P-Value so ridiculously high?

Submitted by Paul Alper

## How a statistical formula won the war

Gavyn Davies does the maths, Gavyn Davies, The Guardian (UK), July 20 2006.

This article relates how statisticians were called on to estimate the number of enemy tanks prior to the allied attack on the western front in 1944.

The statisticians had one key piece of information, which was the serial numbers on a few captured tanks. Assuming that the tanks were logically numbered, in the order in which they were produced, was enough to enable the statisticians to make an estimate of the total number of tanks that had been produced up to any given moment, based on the highest serial number in the sample and the sample size.

Suppose the tanks were numbered 1 to N, where N was the total number of tanks produced and that five tanks had been captured with serial numbers 20, 31, 43, 78 and 92, say. From a sample of S = 5 and a maximum serial number M = 92, it was deduced that a good estimator of the number of tanks would be (M-1)(S+1)/S. In the example given, this translates to (92-1)(5+1)/5, which equals 109.2.

In reality, the estimated number was 245 per month and, after the war, it was confirmed that the actual number was 246, whereas intelligence estimates were incorrectly far higher.

### Questions

• What assumptions are involved in the formula given in the article?
• How robust is the estimate?
• Should the serious consequence of the estimation (launching an invasion) have any influence on the way the estimation is performed?
• Can you think of any other information that might have helped to solve the problem?

Submitted by John Gavin.

## Can statistics determine if Clemens used steroids?

Report Backing Clemens Chooses Its Facts Carefully
New York Times, Feb. 10, 2008

The authors of this article are professors at the University of Pennsylvania’s Wharton School.

The Times article begins:

Last week, Roger Clemens made the rounds on Capitol Hill to rebut charges by Brian McNamee, his former trainer, that he used steroids and human growth hormone late in his career. In addition, Clemens’ agents from Hendricks Sports Management have provided a report loaded with numbers — 45 pages, 18,000 words and 38 charts — to support his position. You can find the report here.

The article goes on to say:

The report hinges on a critical question: Was Clemens’s late-career success highly unusual? If so, an unusual late-career improvement lends credence to the Mitchell report’s assertion that he used performance-enhancing drugs at various times from 1998 onward. The Clemens report tries to dispel this issue by comparing him with Nolan Ryan, who retired in 1993 at 46. In this comparison, Clemens does not look atypical — both enjoyed great success well into their 40s. Similar conclusions can be drawn when comparing Clemens with two contemporaries, Randy Johnson and Curt Schilling.

The Clemens report itself does not refer at all to the issue of drugs but rather gives and very detailed account of the ups and downs of Clemens' pitching throughout his career using earned run average (ERA) for each year as a measure of success. (The ERA is the mean of earned runs given up by a pitcher per nine innings pitched)

The authors criticize this report arguing that the ERA is not a good measure to use because it is affected by factors, such as the field, that have nothing to do with the ability of the pitcher. They also say that choosing the three best pitchers is selection bias. The authors then discuss their attempt to provide a rigorous statistical study. They write:

A better approach to this problem involves comparing the career trajectories of all highly durable starting pitchers. We have analyzed the progress of Clemens as well as all 31 other pitchers since 1968 who started at least 10 games in at least 15 seasons, and pitched at least 3,000 innings. For two common pitching statistics, earned run average and walks-plus-hits per innings pitched, we fitted a smooth curve to all the data from these 31 pitchers and compared it with those for Clemens’s career.

The article provides the following graphics to show the author's results:

http://graphics8.nytimes.com/images/2008/02/10/sports/10score_GFX2.jpg

The authors go on to say:

Our reading is that the available data on Clemens’ career strongly hint that some unusual factors may have been at play in producing his excellent late-career statistics.

In any analysis of his career statistics, it is impossible to say whether this unusual factor was performance-enhancing drugs.

The Clemens report argues that his longevity “was due to his ability to adjust his style of pitching as he got older, incorporating his very effective split-finger fastball to offset the decrease in the speed of his regular fastball caused by aging.” While this may be true, it is also just speculation: there is not a single number in the report quantifying the evolution of Clemens’s pitch selection.

More details including more graphics for the Wharton study can be found in Wolfers' article "Analyzing Roger Clemens: A step-by-step guide" This article is followed by comments from 57 readers. Many of these want to see more details of this study. Justin Wolfer tells us that their article should be available in a few days and available from his website.

Two interesting critiques of the Wharton study are given by Phil Birnbaum on his Sabermetric Research Blog. These are his February 10 posting: "Clemens Report" criticism misses the point and his Februry 11 posting: "The Wharton Clemens Report criticism -- Part 11"

Another article is "Roger Clemens, Barry Bonds, Performance-Enhancing Drugs, and Hypothesis Testing A Case Study in Baseball and Hypothesis Testing" by Phillip Mayfield. Mayfield explains what a test of hypothesis is, and how it is carried out, and illustrates this using two drugs controversies, one involving Roger Clemens and the other Barry Bonds. In the Clemens tests, there was not a significant increase in his pitching ability during a period that he was said to be using performance-enhancing drugs, but, in the Bonds study, there was a significant increases in his hitting ability over such a period. This would be an interesting article to discuss in an elementary statistics class.

And finally: Learning Unethical Practices from a Co-worker: The Peer Effect of Jose Canseco by Eric D. Gould Hebrew University and Todd R. Kaplan Haifa University. From their abstract we read:

This paper examines the issue of whether workers learn productive skills from their coworkers, even if those skills are unethical. Specifically, we estimate whether Jose Canseco,

one of the best baseball players in the last few decades, affected the performance of his teammates. In his autobiography, Canseco claims that he improved the productivity of his

teammates by introducing them to steroids.

## Its a staggering bet

Eight horses and 50p make punter a millionaire
Telegraph, Feb 24. 2008 Andrew Alderson

This story was suggested by Bob Drake

It started with a horse called Isn't That Lucky and ended with one called A Dream Come True - a run of eight winners that turned a 50p stake into Britain's first million pound betting-shop pay-out.

An unnamed small-time gambler, believed to be in his sixties, backed eight horses to win races in a multiple bet called an accumulator, at combined odds of nearly 2.8 million to one. If just one had lost, he would not have received a penny.

David Hood, a spokesman for William Hill, said: "It is a staggering bet, and earns him a place in history as the world's first betting shop millionaire. Even a scriptwriter couldn't have dreamt this one up."

Note: Later accounts identified the winner as Freddie Craggs and reported that he was the third betting-shop millionaire. An article in the Mirror says that Freddie was even luckier than he thinks. They say:

His eight-horse accumulator, including three runners at Nad Al Sheba in Dubai, took him to the maximum betting-shop pay-out of £1m - but bookies Hills would have been within their rights to pay him only £100,000, their limit when selections in overseas horse-racing are included in a bet. They were happy to stump up £1m...possibly because of the timing of Freddie's flutter. It came bang in the middle of a court case in which a punter is taking Hills to court for allowing him to lose £2million when he'd allegedly asked to be banned.

### Discussion

(1) How does this compare with winning the Powerball lottery jackpot?

(2) How do you think the odds were determined for the horse race?

## Understanding Uncertainties

Plus Magazine is a mathematics magazine described a follows:

Plus magazine opens a door to the world of maths, with all its beauty and applications, by providing articles from the top mathematicians and science writers on topics as diverse as art, medicine, cosmology and sport. You can read the latest Mathematical news on the site every week, subscribe to our fortnightly email newsletter, read our online magazine published four times a year, and browse our archive containing all past issues and news items.

Issue 45 of Plus Magazine introduced a new column called "risk and uncertainty" written by David Spiegelhalter, Winton Professor for the Public Understanding of Risk at the University of Cambridge. Mike Pearson provides the animation for the first two columns.

These 2 columns deal with what the British call League Tables but what we would call rankings. The first column appeared in Issue 45 and the second column second column in Issue 46.

In the first column the authors provide an example of a League Table in which the outcomes should be pure chance but the results seem not to be consistent with pure chance. In the second column they discuss a League Table where the outcomes should depend on skill but chance also plays a role.

For the first column the authors look at the results of the UK National Lottery from the time it started on the 19th of November 1994 to the 20th of October 2007. In this lottery six balls are randomly drawn from a set of 49 numbered balls. If you buy a ticket and correctly predict the six numbers, you get a share of the jackpot, which is usually a large amount of money. During the time considered, there were 1240 such draws of six numbers made by the lottery officials. An animation is provided that shows these winning six balls as they occurred. This results in the following histogram showing the number of times that each of the 49 numbers occurred in the draws.

http://www.dartmouth.edu/~chance/forwiki/histogram.jpg

From this is appears that 38 was a lucky number and the authors ask if this could happen by chance. If the winning numbers were chosen at random, the number of times a number would be one of the winning numbers would have a Binomial distribution with mean np where n =1420 and p = 6/49 and variance = np(1-p). Thus, if the numbers are chosen randomly, the expected number of times a number occurs is 151.244, the variance is 133.244 and the standard deviation is 11.5432. From the histogram we see that the number 38 occurred 180 times which is about 2.5 standard deviations more than the expected number which is pretty unlikely if the numbers were chosen randomly. However, the authors provide more evidence that the numbers were chosen randomly with the following comparison between the observed distribution and the theoretical distribution.

http://www.dartmouth.edu/~chance/forwiki/clt.jpg

But the authors also look for the longest gap between times that a number is among the winning numbers. This was the gap of 72 draws, which occurred when number 17 appeared in draw 435 on the 23rd of February 2000, but did not appear again until draw 508 on the 4th of November 2000. The authors estimate that the probability that this would occur is about .000082 which again would suggest that the winning numbers were not chosen randomly.

Referring again to the question: were the winning numbers chosen randomly? The authors write:

But what about that maximum gap of 72, which we worked out to be extremely unlikely? It turns out that we asked slightly the wrong question, namely: "The number 17 has just been drawn, what is the chance that it will not be drawn within the next 72 draws?" In reality, though, we are looking at the results of 1240 draws, rather than just 72, and we are not interested specifically in the number 17. What we should have asked is: "After 1240 draws, what is the chance that any of the gaps between two draws of the same number is greater than 72?".

The authors carry out the calculations and find that the answer to this question is about 50% so here randomness wins.

The authors go on to describe more sophisticated statistical methods for testing the hypothesis that the numbers were randomly chosen.

Of course there are examples that show that lotteries are not always randomly chosen. One of the most famous examples is the 1970 draft lottery. A nice discussion of this lottery is provided by Norton Starr in the the Journal of Statistics Education v.5, n.2 (1997). In the preface we read:

The 1970 draft lottery for birthdates is reviewed as an example of a government effort at randomization whose inadequacy can be exhibited by a wide variety of statistical approaches. Several methods of analyzing these data -- which were of life-and-death importance to those concerned -- are given explicitly and numerous others are cited. In addition, the corresponding data for 1971 and for 1972 are included, as are the alphabetic lottery data, which were used to select draftees by the first letters of their names. Questions for class discussion are provided. The article ends with a survey of primary and secondary sources in print.

We will discuss the second "Understanding Uncertainties" column in the next issue of chance news.

### Discussion

(1) What is the probability that at least one number occurs 38 or more times?

(2) How does one decide if a questions about the outcomes is a reasonable test for randomness?

Submitted by Laurie Snell

As a followup to the previous article on Plus magazine, the latest edition offers a section that may be of interest to Chance News readers: Teacher package: Statistics and probability theory

This teacher package brings together all Plus articles on statistics and probability theory, into three categories:

• Fun and games — The articles in this category use stats and probability to understand games of chance, sports, and strange coincidences;
• Understanding life — This category explores how stats and probability are used to understand all aspects of life, from death and disease to fraud.
• Lies, damn lies — This category focuses on those pitfalls and conundrums that sometimes contribute to our mistrust of stats.

Submitted by John Gavin.

## Perception can be everything

More Expensive Placebos Bring More Relief, Benedict Carey, The New York Times, March 5, 2008.
The Nocebo Effect: Placebo's Evil Twin, Brian Reid, The Washington Post, April 30, 2002.

This short article claims that a higher price can create the impression of higher value and that may explain the popularity of some high-cost drugs over cheaper alternatives.

One of the authors, Dan Ariely, a behavioral economist at Duke, says that

It’s all about expectations. When you’re expecting pain relief, you’re secreting your own opioids. And when you get it on discount, you doubt it, and your body doesn’t react as well.

The article is based on a paper in The Journal of the American Medical Association in which 82 men and women rate the pain caused by electric shocks, before and after taking a pill. Half had read that the pill, described as a newly approved prescription pain reliever, cost $2.50 and half were told that it cose$0.10 but both were dummy pills. After correcting for each person's tolerance of pain, the pills had a strong placebo effect in both groups, but 85 percent of those using the expensive pills reported significant pain relief; compared with 61 percent on the cheaper pills.

Previous studies have shown that pill size and color also affect people’s perceptions of effectiveness. In one, people rated black and red capsules as 'strongest' and white ones as 'weakest'. But a Dutch study found that most people considered red and orange pills to be stimulating, with blue and green-colored pills more likely to have a depressant effect. While in Italy, blue placebos made excellent sleeping pills for women but had the opposite effect on men. (The apparent reason? The Italian national football team's color is blue.)

Other information like the country where the drugs were manufactured can also affect perceptions.

### Questions

• How might you adjust an experiment to cater for such placebo related potential influences as the colour of the pill or where it is manufactured?
• Can you think of any other factors that might influence people's perceptions/expectations?
• How can you be sure that you have accounted for the major influences?
• Most people don't (and wouldn't) steal a pencil from a store but they do (or would) take a pencil from their workplace. So perhaps the issue is not about the number of pencils but rather the context where the pencil is to be had. How might a variable like 'context' (or emotions or or social norms) be incorporated into a statistical analysis? What criteria might be required before any of these factors could be incorporated into a statistical analysis, such as having objective, measurable and verifiable prior expectations?
• The opposite of placebo is nocebo (no-SEE-bo) - a substance producing harmful effects in someone because it is believed to be harmful, but which in reality is harmless i.e. presume the worst, health-wise, and that's just what they get. Do you expect there to be a nocebo effect in patients?
• Would it be measured in the same way as a placebo effect?
• Are the ethical considerations for nocebos different to those for placebos?
• If a nocebo effect is credible, what is the difference between it and voodoo, where a curse is placed on a person?
• Previous marketing studies have shown that it is possible to change people's reports of how good an experience is by changing their beliefs about the experience. For example, moviegoers will report liking a movie more when they hear beforehand how good it is. What are the effects of expectation, if any, on medical research not involving human subjects?
• If mice are given a placebo by lab technicians who believe they’re giving a drug, do the mice respond as if given the drug?
• Conversly, if the mice show no improvement with a placebo, might the lab technicians, thinking they’d administered a real drug, perceive improvement?