Chance News 53
From Bayesian statistics for experimental scientists written by Hal Stern, Department of Statistics, University of California, Irvine:
If they would only do as he did and publish posthumously
we should all be saved a lot of trouble
in reference to followers of Rev. T. Bayes
If your experiment needs Bayesian statistics,
you ought to have done a better experiment
N. Gilbert (Biometrical Interpretation, 1973)
Submitted by Paul Alper
From a Wall Street Journal article  describing the results of a number of research studies about victims of investment fraud:
The typical investment-scam victim is an optimistic married man in his later 50s who has a higher-than-average knowledge of financial matters and deep confidence in his own judgment ….
[A man told] an FTC fraud forum that he preferred speaking with a man because "you can lather him up and push all the green buttons." Women were more cautious and asked too many questions, he said, prompting an office maxim, "Don't pitch to the b—."
For Todays Graduate, Just One Word: Statistics
The article is about the Joint Statistical Meetings in Washington this week and how the job market for statisticians is booming. Here is the accompanying picture:
A brief Wall Street Journal article  summarized a JAMA report about two recent studies of the relationship between the Mediterranean Diet and cognitive decline. Apparently an (undated) older study had suggested the existence of a relationship.
One of the two recent studies involved 2,000 elderly people and found that “those who adhered more closely to a Mediterranean diet … had less risk of developing [dementia].” The other study, from France, involved 1,410 people and found that “[a]dherence to the diet didn’t change the risk of dementia.”
JAMA’s editorial conclusion:
[A]ll told, there is “moderately compelling evidence that adherence to the Mediterranean-type diet is linked to less late-life cognitive impairment.”
Kuklo's Fellow Infuse Worker
From The Pioneer Press we learn that there is more to the Kuklo story. "Dr. David Polly, the University of Minnesota spine surgeon ... received nearly $1.2 million in consulting fees from medical device giant Medtronic over a five-year period." The details "of Polly's billing records were released this week by Sen. Charles Grassley, R-Iowa, as an attachment to a letter to University of Minnesota President Robert Bruininks. The letter raised questions about how the U polices conflicts of interest among doctors."
Polly's recordkeeping was indeed detailed:
Download CDs from meeting, 15 minutes, $125
Dinner meeting, 240 minutes, $2,000
E-mail Medtronic employee, five minutes, $49.48
Conference call, 90 minutes, $890.63
Teach at scoliosis meeting, 330 minutes, $2,750
"I've not seen anybody bill the way he did," said Rosen, of the University of California-Irvine, who acknowledged that he doesn't do paid consulting work with the device industry.
"In my opinion, it sounds more like an investment banker," he said of the detailed billing. "It doesn't sound like someone in medicine."
Submitted by Paul Alper
Defining a clunker
“When Precision Is Only 92.11567% Accurate”
by Charles Forelle, The Wall Street Journal, August 5, 2009
Temporarily substituting for Carl Bialik (The Numbers Guy), Forelle reports about the government’s cash-for-clunkers program and critiques the EPA’s recently revised definition of a clunker.
The EPA stated that “more precise” data calculated “to four decimal places” led it to revise its miles-per-gallon cutoff figures.
"It is ludicrous to suggest that you can get fuel-consumption accuracy anywhere past the first decimal place, let alone the second," says … an independent U.K. auto tester.
Forelle discusses the “faux precision” of estimates that are often based on sampling, but reported as final counts or measurements without their sometimes large margins of error, as in the case of population or unemployment-rate estimates.
He cites another issue involved in misleadingly precise estimates, that is, lack of adherence to conventional rules relating to the issue of significant figures in arithmetic.
The principle is simple: When combining measured numbers, the final answer is only as precise the least-precise piece of data that went into it; you can't just add a tail of decimal places, even if they show up on the calculator. So a room that's 2.5 meters (two significant digits) by 3.87 meters (three) has an area of 9.7 square meters, though the two numbers multiply to 9.675.
Fuel mileage calculations are apparently based upon tailpipe emissions of carbon dioxide, because released carbon dioxide from burning fuel provides a more accurate measure of gas consumption than direct measurement of consumed fuel. Not only does the EPA believe that the results of two lab tests on each car must be recorded to four decimal places by law, but it also added tests that were not done on older cars, and “created a formula that estimated from the old data what would happen had the new tests been run,” this despite the different precision levels of numbers that went into the formulas. An EPA spokesman said, "Repeatability and accuracy is something we spend a lot of time on."
1. What’s the difference between accuracy and precision?
3. A blogger commented , “Regards your room area example, if both length and width are measured by a person who makes the same direction of error on each measurement -- so that both are either too high or too low -- then the area will not only have almost twice the percentage error of either measurement, it will, on average be too high.” Do you agree with all, or part of, this statement?
4. The author described a 1991 court case in which an Alaskan man failed a bar exam and “missed by 0.5 point the threshold needed for a re-evaluation of his test.” The man claimed that, since the essays were graded with integers, his score should have been rounded up to the next integer. Although the man lost the case, the Alaska Supreme Court found his argument “convincing from a purely mathematical standpoint.” A blogger argued  that there are an infinite number of significant digits in counts (e.g., 1.000…), because “the error in the value of these numbers is ZERO,” and so arithmetic results “can be rounded to as many sig figures as you want to.” Do you agree with the Alaskan man or with the blogger?
All that jazz?
“Can Jazz Be Saved?”
by Terry Teachout, The Wall Street Journal, August 9, 2009
Based on the National Endowment of the Arts’ 2008 Survey of Public Participation in the Arts , popular interest in jazz on the part of adult Americans appears to be experiencing a serious decline. The study was conducted “in participation with” the U.S. Census Bureau.
Several causes for concern about the future of jazz are a general decrease in attendance in at least one jazz performance per year (down from about 11% to 8% for the period 2002-2008) and an increase in the median age of those who do attend (up from 29 to 46 years for the period 1982-2008).
Supplementary materials  include a brochure, trend tables (1982-2008), survey instrument, data user’s guide with information on the survey design and procedures, and raw data file.
Irresponsible data mining
“Data Mining Isn’t a Good Bet For Stock-Market Predictions”
by Jason Zweig, The Wall Street Journal, August 8, 2009
Columnist Jason Zweig discusses “quantitative money manager” David Leinweber’s new book, Nerds on Wall Street: Math, Machines and Wired Markets  (Wiley, June 2009).
In his book, Leinweber “dissects the shoddy thinking that underlies most of [the quantitative] techniques” in use today, and he refers to data-mined numbers as “one of the leading causes of the evaporation of money.”
Zweig describes how Leinweber decided to satirize data mining with an example, meant to be a joke. He found that annual butter production in Bangladesh “explained” 75% of the variation in the annual returns of S&P 500-stock index over a 13-year period.
By tossing in U.S. cheese production and the total population of sheep in both Bangladesh and the U.S., Mr. Leinweber was able to "predict" past U.S. stock returns with 99% accuracy.
Leinweber has advice for avoiding “falling into a data mine”: (a) Check that the results make sense; (b) Check that the claim still holds for smaller subsets of the data; (c) Check the results after costs, fees, and taxes are subtracted; (d) Wait to see if the claim continues to hold true as time goes by.
Students may enjoy a 4-minute video  of Jason Zweig interviewing Nerds author David Leinweber. Or they may be interested in YouTube videos (8 minutes, in four parts)  ] of a lecture by David Leinweber.
The article contained a chart  showing the “Correlation of Super Bowl wins by original NFL teams with positive return[s] for the S&P 500.” The bar chart shows the S&P 500 return for each year 1967 through August 6, 2009, with 32 blue bars for years in which a “correlation held” and 11 red bars for years in which it did not.
1. What do you think the chart’s author meant by stating that a “correlation held”? What else would you like to know about his/her “correlation”?
2. Suppose there were a positive correlation, even a relatively high one, between NFL Super Bowl team wins and positive S&P 500 returns. Would you be surprised that Super Bowl wins did not predict positive S&P returns for every one of the years?