Chance News 39
Steve Simon suggests the following classic quote:
The glitter of the t table diverts attention from the inadequacies of the fare. -- Sir Austin Bradford Hill. Source: Austin Bradford Hill, "The Environment and Disease: Association or Causation?" Proceedings of the Royal Society of Medicine, 58 (1965), 295-300. The full text of this paper is reproduced at Edward Tufte's website and is well worth reading. This quote is cited in many places including the excellent article "The missed lessons of Sir Austin Bradford Hill" Phillips CV, Goodman KJ. Epidemiol Perspect Innov. 2004; 1: 3. doi: 10.1186/1742-5573-1-3.
Paul Alper suggested the following quotations from Chapter 15 of The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb.
THE BELL CURVE, THAT GREAT INTELLECTUAL FRAUD." Heading of Chapter 15, page 229.
Forget everything you heard in college statistics or probability theory. If you never took such a class, even better. Page 229.
If you ever took a (dull) statistics class in college, did not understand much of what the professor was excited about, and wondered what 'standard deviation' meant, there is nothing to worry about. Page 239.The bell curve satisfies the reductionism of the deluded." Page 239.
The following forsooths are from the September 2008 RRS News
Researchers at Cardiff University School of Social Science claim errors made by the Hawk-Eye line-calling technology can be greater than 3.6mm - the average error quoted by the manufacturers.
12 June 2008
There are now more overweight people in America than average-weight people. So overweight people are now average. Which means your've met your New Year's resolution.
Jay Leno reported in the Cork Evening Echo
8 September 2007
Ministers define child poverty as children living on less than 60% of median income, adjusted for composition of the household. The median is the halfway point between the nations's highest and lowest income.
11 December 2007
The following is a graphical forsooth. It has circulated on several email lists including edstat-l.
Submitted by Steve Simon
Two quantities were almost equal on average, according to Christine Garver-Apgar, the study’s lead author: the fraction of MHC genes shared, and the woman’s number of extra partners. In other words, if the man and woman had half the genes in common, the woman would have on average nearly half a lover on the side.Genes may help predict infidelity, study reports.
Special to World ScienceNov. 30, 2006
Submitted by Paul Alper
This quote is from a report on dual career couples in academia (also called academic partners).
Women are more likely than men to have academic partners (40% of
female faculty in our sample versus 34% of male faculty). In fact, rates of dual hiring are higher among women respondents than among men respondents (13% versus 7%). This means that couple hiring becomes a particularly relevant strategy for the recruitment and retention of femalefaculty.
Submitted by Steve Simon
Here are a number of Forsooths from Stephen Senn's book Statistical Issues in Drug Development (Statistics in Practice)
"Most Trials are unethical because they are too large." Page 178.
"Small Trial are unethical." Page 179.
"A significant result is more meaningful if obtained from a large trial." Page 179.
"A given significant P-value is more indicative of the efficacy of a treatment if obtained from a small trial." Page 180.
"For a given P-value, the evidence against the null hypothesis is the same whatever the size of the trial." Page 182.
"Warning: Trials in homeopathy are extremely dangerous. If patients forget to take their medicine they are likely to overdose." Page 288.
"Pharmaco -economist: one who asks, not only if the treatment for dysentery was effective, but also after the price of toilet paper." Page 361.
1. Stephen Senn, like many statisticians, loves to make jokes. Sometimes the humor escapes the reader. Look up homeopathy in Google to see why his statement about overdosing when the patient neglects to take the homeopathic medicine is hilarious. Submitted by Paul Alper
A quantitative approach to art history
A Textbook Example of Ranking Artworks, Patricia Cohen, The New York Times, August 4, 2008.
An economist has offered a surprisingly forthright opinion about the art world.
Ask David Galenson to name the single greatest work of art from the 20th century, and he unhesitatingly answers “Les Demoiselles d’Avignon,” a 1907 painting by Picasso.
The ranking was based on a purely quantitative criteria: how often the artwork was reproduced in books about art (28 times, more than any other artwork).
His statistical approach has led to what he says is a radically new interpretation of 20th-century art, one he is certain art historians will hate. It is based in part on how frequently an illustration of a work appears in textbooks. “Quantification has been almost totally absent from art history,” he said. “Art historians hate markets.”
Previous work in art by Mr. Galenson also had a quantitative bent.
In 2002 Mr. Galenson discussed his theories about creativity in the book “Painting Outside the Lines.” Then, two years ago, he published “Old Masters and Young Geniuses: The Two Life Cycles of Artistic Creativity,” arguing that young innovators have a flash of inspiration that upends the existing order in an instant. There are old geniuses too, he said, but their approach is vastly different. They are what he labeled “experimentalists,” who develop their work gradually through years of trial and error. His theory of creativity was based in part on examining auction prices. His approach was hailed by some as a breakthrough, and this spring he was awarded a Guggenheim fellowship to pursue his research.
Auction prices, although clearly quantitative, do have a problem.
Since many of the most important individual works rarely, if ever, come to market, he decided to use art history textbooks to value each piece. He tallied the number of illustrations of each piece in the 33 textbooks he found that were published between 1990 and 2005, on the assumption that the most important works merited the most illustrations.
This effort has received both praise
Michael Rushton, who teaches the economics of art at Indiana University, said that Mr. Galenson was on to something; in science or art, he said, “innovation really requires a market.”
Art experts, not surprisingly, are more skeptical. “The economic notion of artists is interesting for art historians to have to grapple with,” John Elderfield, chief curator emeritus at the Museum of Modern Art, said when Mr. Galenson’s theory was described to him. “These are works in the histories that we tell of modern art. They seem to be milestones, and that’s fair enough.” But he cautioned that this approach could only go so far. “There are great, great things being made which are not reducible to statistics.”
What are the other great works of art, according to this statistical criteria?
Vladimir Tatlin’s “Monument to the Third International” (1919-20), a plan for a celebratory tower, came in second with 25 illustrations.
“Spiral Jetty,” a gigantic earthwork coil that Robert Smithson planted in the Great Salt Lake in Utah 1970, came in third with 23,
followed by Richard Hamilton’s “Just What Is It That Makes Today’s Homes So Different, So Appealing?,” a 1956 collage widely considered to be the first Pop Art, with 22.
Umberto Boccioni’s 1913 bronze sculpture “Unique Forms of Continuity in Space”
tied Picasso’s “Guernica” (1937) with 21.
Marcel Duchamp’s 1917 “Fountain” — a white urinal — was seventh with 18 illustrations,
and his 1912 painting “Nude Descending a Staircase, No. 2” was eighth with 16.
The article mentions the following book:
The $12 Million Stuffed Shark: The Curious Economics of Contemporary Art. Don Thompson (2008). Palgrave Macmillan. ISBN-10: 0230610226.
1. Does producing a quantitative measure of something like art make sense? Does it enhance our understanding of art or grossly oversimplify it?
2. How does Mr. Galenson's efforts compare to other quantitative measures of success, such as frequency of citation in the peer-reviewed literature?
3. Recent publications (such as this one or the book Freakonomics) seem to imply that all of life's difficult questions can be understood from an economic perspective. To what extent does it help or hurt to incorporate economic values into areas ostensibly outside of economics?
4. What (if any) other "great, great things being made are not reducible to statistics"?
Submitted by Steve Simon
Death and taxes - 2009 edition
On the theme of visualising data in an interesting way, graphic designer, Jesse Bachman, has updated his death and taxes graph with the latest figures from the US President's official budget request and the comptroller of the Department of Defense. This on-line, interactive graph neatly visualizes how the US federal government spends its income taxes.
Clicking on this image will open up a larger but static version of the graph. Whereas the interactive website version allows users to zoom in on different parts and to pan around the graph to see its full resolution, which isn't available with the static graphs associated with this article.
This is a large representational graph and poster of the federal budget. It contains over 500 programs and departments and almost every program that receives over 200 million dollars annually. The data is straight from the president's 2009 budget request and will be debated, amended, and approved by Congress to begin the fiscal year. All of the item circles are proportional in size to their spending totals and the percentage change from 2008 is included to spot trends and disproportion.
- For an example of a drilldown within this graph, see Bush’s new Alternative Energy is powered by smoke and mirrors, which offers an overview of the Department of Energy as imaged in the poster.
- The author also offers some entertaining ways to visualize one billion dollars.
- On a related topic, this film (4.7 mb, QuickTime 5) by Nigel Holmes visualizes the relative sizes of the US surplus and debt.
- This topic was discussed in more detail in a previous Chance News article Death and taxes - 2007 edition.
Submitted by John Gavin.
Princeton Meta-Analysis of State Polls for 2008
InChanceNews13.3 we described a mew method of predicting the outcome of the 2004 presidential election developed by Samuel Wang at Princeton.
In a recent e-mail Wang wrote us about his method applied to the coming election.
I'm pleased to announce the re-launch of my Meta-Analysis of State
Polls for 2008. It's now automated and provides a current snapshot of all recent polls as seen through the lens of the Electoral College. It provides a precise measure of where the Presidential race stands, and has a fraction of the uncertainty of any available poll (or even anaverage of polls). It's available here
The first step in Wang's Electoral College Meta-Analysis is to estimate the probability that each of the candidates will win the 50 states using the results of the state polls over about a week's time period. Wang assumes that the true value of the Obama-McCain margin (number of Obama votes - number of McCain votes) is a random variable normally distributed with mean and standard deviation estimated from the state polls. From this he estimates the probability that each candidate will win.
Wang then uses his estimated probabilities that each candidate will win to calculate the probability of each sequence of possible winners for the 50 states. Here he assumes independence. There are For 50 states and the District of Columbia the total number of combinations is 2^51 = 2,251,799,813,685,248 (nearly 2.3 quadrillion). Using these results Wang calculates the distribution for the number of electoral votes Obama will win. He chooses the median of this distribution for his estimate for the number of electoral votes Obama will win.
On the right side of their homepage you can find graphics related the current predictions.
Finally you will fine "Interactive maps" and by choosing "current probabilities" you will see a graphic of the states indicating on current day, for each state, whether Obama is predicted to win the electoral vote (red states) or McCain is predicted to win the electorlal vote (green). or it is too close to call (orange)
Here is the graphic for August 17 2008.
You can obtain the current map by going here and from "INTERACTIVE MAP" choosing "Current Probabilities"
Sam mentions that there others methods to do make predictions similar to his. He writes:
Many of you are fans of Poblano's excellent site, which combines data with his own detailed methodological judgements. There is also an Ur-polling site. Purists may prefer Pollster.com or RealClearPolitics for straight-up polling numbers. What I provide is an approach that is purely poll-based, but simpler to follow than a river of data.
You might also like to look at Andrea Moro's presidential forcasts. He uses simulation to estimate the relevant probabilities.
Look up the other approaches that Sam mentioned and see how their results compare with his and comment on pros and cons of the different methods.
Submitted by Laurie Snell
More on DNA in the courts
In Chance news 37 we discussed an article by Jason Felch and Maura Dolan in the Los Angeles Times that discussed the problem of using DNA in the courts in terms of a specific case. Patrick O'Beirne suggested a more recent article by Felch and Dolan and a more detailed analysis by Michael Black and Steve Schafer in the http://catless.ncl.ac.uk/Risks/25.28.html Risk Dyjest Volume 25 Issue 28]
Predjudice, Penalaties and Premiums
Discrimination on the basis of race, religion, creed, age or gender is more or less outlawed in most of the United States. Nevertheless, as The Economist print edition of December 19, 2007—Biblically entitled, “To Those that have shall be given--points out, “The ugly are one of the few groups against whom it is still legal to discriminate.” It refers to the work of Daniel Hamermesh, a University of Texas economist, who showed “that when all other things are taken into account, ugly people earn less than average incomes, while beautiful people earn more than the average”<td>
The ""Economist"" article speculates that these results and others could be because “beauty is a real marker for other underlying characteristics such as health, good genes and intelligence. It is what biologists call an unfakeable signal.”
Stephen J. Dubner in the New York Times of May 26, 2008 writes about the premiums and penalties of good teeth. He refers to the the work of Glied and Neidell who “found that women who grew up drinking fluoridated water earn about 4 percent more than women who didn’t, although they found no effect for men.” Glied and Neidell further state, “the effect is almost exclusively concentrated amongst women from families of low socioeconomic status.” Dubner also refers to the work of Case and Paxson, NBER Working Paper No. 12466 Issued in August 2006, which says, “On average, taller people earn more because they are smarter. As early as age 3 — before schooling has had a chance to play a role — and throughout childhood, taller children perform significantly better on cognitive tests. The correlation between height in childhood and adulthood is approximately 0.7 for both men and women.”
1. All of the above numbers refer to averages. Explain which average--mean, median or mode—is most appropriate. What important statistical quantity is entirely absent?
2. The Economist’s example of an unfakeable signal was “the deep roar of a big, rutting stag that smaller adolescents are physically incapable of producing.” Name some other unfakeable symbols.
3. Dubner poses the question: “So what does all this research mean if you are a short, not-so-attractive person with bad teeth? His answer may found in his article but what is yours?