Chance News 39
Steve Simon suggests the following classic quote:
The glitter of the t table diverts attention from the inadequacies of the fare. -- Sir Austin Bradford Hill. Source: Austin Bradford Hill, "The Environment and Disease: Association or Causation?" Proceedings of the Royal Society of Medicine, 58 (1965), 295-300. The full text of this paper is reproduced at Edward Tufte's website and is well worth reading. This quote is cited in many places including the excellent article "The missed lessons of Sir Austin Bradford Hill" Phillips CV, Goodman KJ. Epidemiol Perspect Innov. 2004; 1: 3. doi: 10.1186/1742-5573-1-3.
Paul Alper suggested the following quotations from Chapter 15 of The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb.
THE BELL CURVE, THAT GREAT INTELLECTUAL FRAUD." Heading of Chapter 15, page 229.
Forget everything you heard in college statistics or probability theory. If you never took such a class, even better. Page 229.
If you ever took a (dull) statistics class in college, did not understand much of what the professor was excited about, and wondered what 'standard deviation' meant, there is nothing to worry about. Page 239.
The bell curve satisfies the reductionism of the deluded." Page 239.
The following is a graphical forsooth. It has circulated on several email lists including edstat-l.
Submitted by Steve Simon
Two quantities were almost equal on average, according to Christine Garver-Apgar, the study’s lead author: the fraction of MHC genes shared, and the woman’s number of extra partners. In other words, if the man and woman had half the genes in common, the woman would have on average nearly half a lover on the side.Genes may help predict infidelity, study reports.
Special to World ScienceNov. 30, 2006
Submitted by Paul Alper
Here are a number of Forsooths from Stephen Senn's book Statistical Issues in Drug Development (Statistics in Practice)
"Most Trials are unethical because they are too large." Page 178.
"Small Trial are unethical." Page 179.
"A significant result is more meaningful if obtained from a large trial." Page 179.
"A given significant P-value is more indicative of the efficacy of a treatment if obtained from a small trial." Page 180.
"For a given P-value, the evidence against the null hypothesis is the same whatever the size of the trial." Page 182.
"Warning: Trials in homeopathy are extremely dangerous. If patients forget to take their medicine they are likely to overdose." Page 288.
"Pharmaco -economist: one who asks, not only if the treatment for dysentery was effective, but also after the price of toilet paper." Page 361.
1. Stephen Senn, like many statisticians, loves to make jokes. Sometimes the humor escapes the reader. Look up homeopathy in Google to see why his statement about overdosing when the patient neglects to take the homeopathic medicine is hilarious. Submitted by Paul Alper
A quantitative approach to art history
A Textbook Example of Ranking Artworks, Patricia Cohen, The New York Times, August 4, 2008.
An economist has offered a surprisingly forthright opinion about the art world.
Ask David Galenson to name the single greatest work of art from the 20th century, and he unhesitatingly answers “Les Demoiselles d’Avignon,” a 1907 painting by Picasso.
The ranking was based on a purely quantitative criteria: how often the artwork was reproduced in books about art (28 times, more than any other artwork).
His statistical approach has led to what he says is a radically new interpretation of 20th-century art, one he is certain art historians will hate. It is based in part on how frequently an illustration of a work appears in textbooks. “Quantification has been almost totally absent from art history,” he said. “Art historians hate markets.”
Previous work in art by Mr. Galenson also had a quantitative bent.
In 2002 Mr. Galenson discussed his theories about creativity in the book “Painting Outside the Lines.” Then, two years ago, he published “Old Masters and Young Geniuses: The Two Life Cycles of Artistic Creativity,” arguing that young innovators have a flash of inspiration that upends the existing order in an instant. There are old geniuses too, he said, but their approach is vastly different. They are what he labeled “experimentalists,” who develop their work gradually through years of trial and error. His theory of creativity was based in part on examining auction prices. His approach was hailed by some as a breakthrough, and this spring he was awarded a Guggenheim fellowship to pursue his research.
Auction prices, although clearly quantitative, do have a problem.
Since many of the most important individual works rarely, if ever, come to market, he decided to use art history textbooks to value each piece. He tallied the number of illustrations of each piece in the 33 textbooks he found that were published between 1990 and 2005, on the assumption that the most important works merited the most illustrations.
This effort has received both praise
Michael Rushton, who teaches the economics of art at Indiana University, said that Mr. Galenson was on to something; in science or art, he said, “innovation really requires a market.”
Art experts, not surprisingly, are more skeptical. “The economic notion of artists is interesting for art historians to have to grapple with,” John Elderfield, chief curator emeritus at the Museum of Modern Art, said when Mr. Galenson’s theory was described to him. “These are works in the histories that we tell of modern art. They seem to be milestones, and that’s fair enough.” But he cautioned that this approach could only go so far. “There are great, great things being made which are not reducible to statistics.”
What are the other great works of art, according to this statistical criteria?
Vladimir Tatlin’s “Monument to the Third International” (1919-20), a plan for a celebratory tower, came in second with 25 illustrations.
“Spiral Jetty,” a gigantic earthwork coil that Robert Smithson planted in the Great Salt Lake in Utah 1970, came in third with 23,
followed by Richard Hamilton’s “Just What Is It That Makes Today’s Homes So Different, So Appealing?,” a 1956 collage widely considered to be the first Pop Art, with 22.
Umberto Boccioni’s 1913 bronze sculpture “Unique Forms of Continuity in Space”
tied Picasso’s “Guernica” (1937) with 21.
Marcel Duchamp’s 1917 “Fountain” — a white urinal — was seventh with 18 illustrations,
and his 1912 painting “Nude Descending a Staircase, No. 2” was eighth with 16.
The article mentions the following book:
The $12 Million Stuffed Shark: The Curious Economics of Contemporary Art. Don Thompson (2008). Palgrave Macmillan. ISBN-10: 0230610226.
1. Does producing a quantitative measure of something like art make sense? Does it enhance our understanding of art or grossly oversimplify it?
2. How does Mr. Galenson's efforts compare to other quantitative measures of success, such as frequency of citation in the peer-reviewed literature?
3. Recent publications (such as this one or the book Freakonomics) seem to imply that all of life's difficult questions can be understood from an economic perspective. To what extent does it help or hurt to incorporate economic values into areas ostensibly outside of economics?
4. What (if any) other "great, great things being made are not reducible to statistics"?
Submitted by Steve Simon
Death and taxes - 2009 edition
On the theme of visualising data in an interesting way, graphic designer, Jesse Bachman, has updated his death and taxes graph with the latest figures from the US President's official budget request and the comptroller of the Department of Defense. This on-line, interactive graph neatly visualizes how the US federal government spends its income taxes.
Clicking on this image will open up a larger but static version of the graph. Whereas the interactive website version allows users to zoom in on different parts and to pan around the graph to see its full resolution, which isn't available with the static graphs associated with this article.
This is a large representational graph and poster of the federal budget. It contains over 500 programs and departments and almost every program that receives over 200 million dollars annually. The data is straight from the president's 2009 budget request and will be debated, amended, and approved by Congress to begin the fiscal year. All of the item circles are proportional in size to their spending totals and the percentage change from 2008 is included to spot trends and disproportion.
- For an example of a drilldown within this graph, see Bush’s new Alternative Energy is powered by smoke and mirrors, which offers an overview of the Department of Energy as imaged in the poster.
- The author also offers some entertaining ways to visualize one billion dollars.
- On a related topic, this film (4.7 mb, QuickTime 5) by Nigel Holmes visualizes the relative sizes of the US surplus and debt.
- This topic was discussed in more detail in a previous Chance News article Death and taxes - 2007 edition.
Submitted by John Gavin.
Princeton Meta-Analysis of State Polls for 2008
InChanceNews13.3 we described a mew method of predicting the outcome of the 2004 presidential election developed by Samuel Wang at Princeton.
In a recent e-mail Wang wrote us about his method applied to the coming election.
I'm pleased to announce the re-launch of my Meta-Analysis of State
Polls for 2008. It's now automated and provides a current snapshot of all recent polls as seen through the lens of the Electoral College. It provides a precise measure of where the Presidential race stands, and has a fraction of the uncertainty of any available poll (or even an
average of polls). It's available here
The first step in Wang's Electoral College Meta-Analysis is to estimate the probability that each of the candidates will win the 50 states using the results of the state polls over about a week's time period. Wang assumes that the true value of the Obama-McCain margin (number of Obama votes - number of McCain votes) is a random variable normally distributed with mean and standard deviation estimated from the state polls. From this he estimates the probability that each candidate will win.
Wang then uses his estimated probabilities that each candidate will win to calculate the probability of each sequence of possible winners for the 50 states. Here he assumes independence. There are For 50 states and the District of Columbia the total number of combinations is 2^51 = 2,251,799,813,685,248 (nearly 2.3 quadrillion). Using these results Wang calculates the distribution for the number of electoral votes Obama will win. He chooses the median of this distribution for his estimate for the number of electoral votes Obama will win.
On the right side of their homepage you can find graphics related the current predictions.
Finally you will fine "Interactive maps" and by choosing "current probabilities" you will see a graphic of the states indicating on current day, for each state, whether Obama is predicted to win the electoral vote (red states) or McCain is predicted to win the electorlal vote (green). or it is too close to call (orange)
Here is the graphic for August 17 2008.
You can obtain the current map by going here and from "INTERACTIVE MAP" choosing "Current Probabilities"
Sam mentions that there others methods to do make predictions similar to his. He writes:
Many of you are fans of Poblano's excellent site, which combines data with his own detailed methodological judgements. There is also an Ur-polling site. Purists may prefer Pollster.com or RealClearPolitics for straight-up polling numbers. What I provide is an approach that is purely poll-based, but simpler to follow than a river of data.
You might also like to look at Andrea Moro's presidential forcasts. He uses simulation to estimate the relevant probabilities.
Look up the other approaches that Sam mentioned and see how their results compare with his and comment on pros and cons of the different methods.
Submitted by Laurie Snell