Chance News 34
One more fagot of these adamantine bandages is the new science of Statistics.
Ralph Waldo Emerson
Fate from The Conduct of Life (1860, rev.1876)
The following Forsooths are from the February 2008 issue of RSS NEWS.
Twenty-six new cases of the inflammatory lung disease sarcoidosis [were seen amongst rescuers] in the first five years after 9/11. Five or fewer rescuers got sarcoidosis anually before 9/11.New York Daily News
21 September 2007.
Actually, I like the Poles' second idea even better. Instead of re-enacting a battle, they suggested, the summiteers should re-sit advanced level mathematics. Voting weights should be based on the square roots of the member states' populations. (Pocket caclulators allowed.)
The next two forsooths were suggested by Paul Alper
Much of the data on overweight people and obesity are limited, equivocal and compromised.Patrick Basham and John Luik in BMJ, Volume 336, page 244, 2 February 2008
The adverse effects of obesity on health are well established, serious, and causal.R.W. Jeffery and N.E. Sherwood in BMJ, Volume 336, page 245, 2 February 2008
I didn't major in math, Huckabee said to the Conservative Political Action Conference meeting, according to the Associated Press. I majored in miracles, and I still believe in them.
Telomeres Tell A Lot
Conventional wisdom, indeed wisdom of any form, indicates that physical activity, a.k.a. regular exercise, is good for you. In particular, intuition would imply that the risk factors for age-related diseases such as diabetes, cancer, hypertension, obesity and osteoporosis would be reduced if people were engaged in physical activity. To make a direct connection between ageing and physical activity, consider a paper in the Archives of Internal Medicine (Vol.168, No. 2, January 28, 2008), “The Association Between Physical Activity in Leisure Time and Leukocyte Telomere Length” by Cherkas, et al.
“Telomeres consist of tandemly repeated DNA sequences that play an important role in the structure and function of chromosomes.” Leukocyte telomere length (LTL) is a proxy variable for one’s biological age as opposed to one’s chronological age. That is, the longer one’s telomeres, the younger one actually is. Conversely, the shorter the telomeres, the more aged.
This study measured the telomeres of 2401 twins who were put into four mutually exclusive categories of physical activity: “Inactive,” “Light,” “Moderate,” and “Heavy” corresponding to “16 minutes, 36 minutes, 102 minutes and 199 minutes” physical activity per week, respectively. The result after adjusting for “Age, sex, and extraction year” was that the “LTL of the most active subjects (group 4) was an average 200 (SE, 79) nt [nucleotides] longer than that of the inactive subjects (group 1)” producing a p-value of .006. The biological implication is “that the most active subjects had telomeres the same length as sedentary individuals up to 10 years younger, on average. This difference suggests that inactive subjects may be biologically older by 10 years compared with more active subjects.” When more complete information was available concerning BMI (biomass index), smoking and SES (socioeconomic status) this reduced the number of subjects to 1531 from the 2401; the LTL difference increased to 213 nt and the p-value increased to .02. Below are a summary table and Figure 1
1. The article states, “The results of this study can be extrapolated to other white individuals (men and women) of North European origin.” Find a biologist or a helpful librarian to determine whether it is suspected that non-whites have different telomere lengths and/or have a different distribution. If so, what does this imply about telomere length and ageing?
2. There were about nine times as many women in the study as men. Why might this be a concern?
3. Something important is missing in Figure 1 and its absence serves to magnify the average difference. What is it?
4. The subjects in the study were twins and therefore, attracted extra lay media attention. Six of the ten authors are affiliated with Kings College, London. From the Kings College website, “Comparing the telomere lengths of twins who were raised together but take different amounts of exercise, reduces the effect of genetic and environmental variation and so provides a more powerful test of the hypothesis.” Obtain the article and reference #21 to determine why twins as subjects as opposed to non-twins are sort of beside the point.
5. There was a “discordant twin-pair analysis” performed “as a further confirmation of the larger analysis.” A paired 2-tailed t test for 67 twin pairs, separated by at least a two category difference is displayed in Figure 2. What defect does it share with Figure 1? Why is it even more misleading given that a paired t test is being done?
6. The article states, “A limitation of this type of study is that physical activity level was self-reported.” Why might this be a limitation?
7. Assume there is a positive association between LTL and physical activity. Give an alternative explanation to physical activity causing greater telomere length. Give another alternative explanation.
Submitted by Paul Alper
Modeling of Diabetes
Intuition can be deceiving. Obvious examples: the earth is flat and at the center of the solar system, Saddam must have had nuclear weapons, bootstrapping can't possibly be valid, earth, air, fire, water and that's it. An intuitive medical model of type 2 diabetes, according to an article by Rob Stein in the Washington Post of February 6, 2008, is "that the lower the blood sugar the better, and that lowering blood-sugar levels to normal saves lives." But, the results of the ACCORD (Action to Control Cardiovascular Risk in Diabetes) trial involving 10,251 randomly assigned patients turned out to "inject an element of uncertainty into what has been dogma." In the stronger words of Dr. Richard Grimm Jr. who helped design the study, "very surprising, shocking."
Surprising and shocking because "257 patients receiving the intensive treatment [lowering the blood sugar level to that of a person who did not have diabetes] had died compared to 203 receiving the standard treatment [lowering the blood sugar level to that of the average person with diabetes]." This result "prompted federal health officials to abruptly stop one part of the trial so thousands of the type 2 diabetes patients in the study could be notified and switched to less risky treatment."
Assume that approximately half of the 10,251 patients were in the intensive treatment group and half were in the standard treatment group.
1. Why would the researchers do a one-tail test rather than a two-tail test?
2. Here is a Minitab run for the data given in the article:
Difference = p (1) - p (2)
Estimate for difference: 0.0105443
95% upper bound for difference: 0.0172689
Test for difference = 0 (vs < 0): Z = 2.58 P-Value = 0.995
Fisher's exact test: P-Value = 0.996
Why is the P-Value so ridiculously high?
Submitted by Paul Alper
How a statistical formula won the war
Gavyn Davies does the maths, Gavyn Davies, The Guardian (UK), July 20 2006.
This article relates how statisticians were called on to estimate the number of enemy tanks prior to the allied attack on the western front in 1944.
The statisticians had one key piece of information, which was the serial numbers on a few captured tanks. Assuming that the tanks were logically numbered, in the order in which they were produced, was enough to enable the statisticians to make an estimate of the total number of tanks that had been produced up to any given moment, based on the highest serial number in the sample and the sample size.
Suppose the tanks were numbered 1 to N, where N was the total number of tanks produced and that five tanks had been captured with serial numbers 20, 31, 43, 78 and 92, say. From a sample of S = 5 and a maximum serial number M = 92, it was deduced that a good estimator of the number of tanks would be (M-1)(S+1)/S. In the example given, this translates to (92-1)(5+1)/5, which equals 109.2.
In reality, the estimated number was 245 per month and, after the war, it was confirmed that the actual number was 246, whereas intelligence estimates were incorrectly far higher.
- What assumptions are involved in the formula given in the article?
- How robust is the estimate?
- Should the serious consequence of the estimation (launching an invasion) have any influence on the way the estimation is performed?
- Can you think of any other information that might have helped to solve the problem?
Submitted by John Gavin.
Report Backing Clemens Chooses Its Facts Carefully
New York Times, Feb. 10, 2008 Eric Bradlow, Shane Jensen, Justin Wolfers, Adi Wyner
This article begins:
Last week, Roger Clemens made the rounds on Capitol Hill to rebut charges by Brian McNamee, his former trainer, that he used steroids and human growth hormone late in his career. In addition, Clemens’s agents from Hendricks Sports Management have provided a report loaded with numbers — 45 pages, 18,000 words and 38 charts — to support his position. You can find the report at the here.
The article goes on to say:
The report hinges on a critical question: Was Clemens’s late-career success highly unusual? If so, an unusual late-career improvement lends credence to the Mitchell report’s assertion that he used performance-enhancing drugs at various times from 1998 onward. The Clemens report tries to dispel this issue by comparing him with Nolan Ryan, who retired in 1993 at 46. In this comparison, Clemens does not look atypical — both enjoyed great success well into their 40s. Similar conclusions can be drawn when comparing Clemens with two contemporaries, Randy Johnson and Curt Schilling.
The report itself does not refer at all to the issue of drugs but rather gives and very detailed account of the ups and downs of Clemens pitching throughout his career using earned run average (ERA) for each year as a measure of success. (The ERA is the mean of earned runs given up by a pitcher per nine innings pitched)
There is no statistical analysis but the comparison with other famous pitchers was clearly meant to suggest that he did not have to use drugs to have such success late in his career.
to be continued