Chance News 54
- 1 Quotations
- 2 Forsooths
- 3 Maynard Keynes' game and the Efficient Market Hypothesis
- 4 Measuring Emotion on the Web
- 5 Assigning points to books
- 6 Earthquake probability maps
- 7 Conditional probability in search and rescue
- 8 Confidence in hurricane predictions
- 9 Short-term probability in lawsuit
- 10 Conditional entropy in text analysis
- 11 Cable news polls a la The Daily Show
Do not put your faith in what statistics say until
you have carefully considered what they do not say.
The probability of Bernanke being reappointed to the Fed is near zero.
Interview on CNBC's August 11 "Squawk Box"
The probability of President Barack Obama visiting the Somerset County Fair this year is low.
Maynard Keynes' game and the Efficient Market Hypothesis
Jeff Norman told us about an interesting game and its relation to investment theory. The game was descried in terms of professional investment by the famous British Economist John Maynard Keynes in his book The General Theory of Employment, Interest and Money, 1936. Here he writes:
Professional investment may be likened to those newspaper competitions in which the competitors have to pick out the six prettiest faces from a hundred photographs, the price being awarded to the competitor whose choice most nearly corresponds to the average preference of the competitors as a whole; so that each competitor has to pick, not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view. It is not a case of choosing those which, to the best of one’s judgment, are really prettiest, nor even those which average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practice the fourth, fifth and higher degrees
Keynes used this game in his argument against the Efficient-market hypothesis (EMH) theory witch is defined at Answers.com as:
An investment theory that states that it is impossible to "beat the market" because stock market efficiency causes existing share prices to always incorporate and reflect all relevant information. According to the EMH, this means that stocks always trade at their fair value on stock exchanges, and thus it is impossible for investors to either purchase undervalued stocks or sell stocks for inflated prices. Thus, the crux of the EMH is that it should be impossible to outperform the overall market through expert stock selection or market timing, and that the only way an investor can possibly obtain higher returns is by purchasing riskier investments.
That efficient market hypothesis is a controversial subject and discussed on many websites. We can see this in an article by John Mauldin who is president of Millennium Wave Advisors, LLC, a registered investment advisor. Here you will also see more about Keynes' game and its relation to the EMF.
We read here Keynes game can be easily replicated by asking people to pick a number between 0 and 100, and telling them the winner will be the person who picks the number closest to two-thirds the average number picked. The chart below shows the results from the largest incidence of the game that I have played - in fact the third largest game ever played, and the only one played purely among professional investors.
The highest possible correct answer is 67. To go for 67 you have to believe that every other muppet in the known universe has just gone for 100. The fact we got a whole raft of responses above 67 is more than slightly alarming.
You can see spikes which represent various levels of thinking. The spike at fifty reflects what we (somewhat rudely) call level zero thinkers. They are the investment equivalent of Homer Simpson, 0, 100, duh 50! Not a vast amount of cognitive effort expended here!
There is a spike at 33 - of those who expect everyone else in the world to be Homer. There's a spike at 22, again those who obviously think everyone else is at 33. As you can see there is also a spike at zero. Here we find all the economists, game theorists and mathematicians of the world. They are the only people trained to solve these problems backwards. And indeed the only stable Nash equilibrium is zero (two-thirds of zero is still zero). However, it is only the 'correct' answer when everyone chooses zero.
The final noticeable spike is at one. These are economists who have (mistakenly...) been invited to one dinner party (economists only ever get invited to one dinner party). They have gone out into the world and realised the rest of the world doesn't think like them. So they try to estimate the scale of irrationality. However, they end up suffering the curse of knowledge (once you know the true answer, you tend to anchor to it). In this game, which is fairly typical, the average number picked was 26, giving a two-thirds average of 17. Just three people out of more than 1000 picked the number 17.
I play this game to try to illustrate just how hard it is to be just one step ahead of everyone else - to get in before everyone else, and get out before everyone else. Yet despite this fact, it seems to be that this is exactly what a large number of investors spend their time doing.
(1) Efficient Market Hypothesis on Trial:A Survey by Philip S. Russel and Violet M. Torbey
(2) A mathematican plays the stock market by John Paulos
John Paulos wrote us: I discussed Keynes' game and the 80% (or 66.66%) game in my book, A Mathematician Plays the Stock Market. I also wrote about the efficient market paradox, a kind of market analogue of the liar paradox: The Efficient Market Hypothesis is true if and only if a sufficient number of investors believes it to be false.
Submitted by Laurie Snell
Measuring Emotion on the Web
"Mining the Web for Feelings, Not Facts"
by Alex Wright, The New York Times, August 23, 2009
There's a lot of data on the web, but it isn't data in the numeric sense.
The rise of blogs and social networks has fueled a bull market in personal opinion: reviews, ratings, recommendations and other forms of online expression.
There are serious reasons to sift through this data.
For many businesses, online opinion has turned into a kind of virtual currency that can make or break a product in the marketplace. Yet many companies struggle to make sense of the caterwaul of complaints and compliments that now swirl around their products online.
A new methodology, sentiment analysis, attempts to summarize the positive and negative emotions associated with these reviews and ratings.
Jodange, based in Yonkers, offers a service geared toward online publishers that lets them incorporate opinion data drawn from over 450,000 sources, including mainstream news sources, blogs and Twitter. Based on research by Claire Cardie, a Cornell computer science professor, and her students, the service uses a sophisticated algorithm that not only evaluates sentiments about particular topics, but also identifies the most influential opinion holders.
In a similar vein, The Financial Times recently introduced Newssift, an experimental program that tracks sentiments about business topics in the news, coupled with a specialized search engine that allows users to organize their queries by topic, organization, place, person and theme. Using Newssift, a search for Wal-Mart reveals that recent sentiment about the company is running positive by a ratio of slightly better than two to one. When that search is refined with the suggested term “Labor Force and Unions,” however, the ratio of positive to negative sentiments drops closer to one to one.
This work isn't easy.
Translating the slippery stuff of human language into binary values will always be an imperfect science, however. "Sentiments are very different from conventional facts," said Seth Grimes, the founder of the suburban Maryland consulting firm Alta Plana, who points to the many cultural factors and linguistic nuances that make it difficult to turn a string of written text into a simple pro or con sentiment. "'Sinful' is a good thing when applied to chocolate cake," he said.
The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis ("love" is good, "hate" is bad). But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions. Reliable sentiment analysis requires parsing many linguistic shades of gray.
Submitted by Steve Simon
1. No algorithm is going to be perfect, but some may provide sufficient accuracy to be useful. How would you measure the accuracy of a sentiment algorithm? How would you decide whether the accuracy was sufficient for your needs?
Assigning points to books
"Reading by the Numbers"
by Susan Straight, The New York Times, August 27, 2009
There's a program in many schools to encourage reading. But some people don't like it.
At back-to-school night last fall, I was prepared to ask my daughter’s eighth-grade language arts teacher about something that had been bothering me immensely: the rise of Accelerated Reader, a 'reading management' software system that helps teachers track student reading through computerized comprehension tests and awards students points for books they read based on length and difficulty, as measured by a scientifically researched readability rating. When the teacher announced during the class presentation that she refused to use the program, I almost ran up and hugged her.
The problem, according to Ms. Straight, is that the system does not give enough credit for the classics.
Many classic novels that have helped readers fall in love with story, language and character are awarded very few points by Accelerated Reader. My Antonia is worth 14 points, and Go Tell It on the Mountain 13. The previous school year, my daughter had complained that some of her reading choices that I thought were pretty audacious — long, well-written historical novels like Libba Bray’s Great and Terrible Beauty and Lisa Klein’s Ophelia, recommended by her college-age sister — were worth only 14 points each. Sense and Sensibility is worth 22.
Indeed the article has a clever graphic showing a handwritten equation "Sense + Sensibility = 22". Instead of giving points to the classics, the system assigns heavy point totals to the Harry Potter books.
Harry Potter and the Order of the Phoenix topped out at 44 points, while Harry Potter and the Deathly Hallows and Harry Potter and the Goblet of Fire were worth 34 and 32.
The points are assigned using a formula system (ATOS).
ATOS employs the three statistics that researchers have
found to be most predictive of reading difficulty: the number of words per sentence, the number of characters per word, and the average grade level of the words in the book.
This formula does have its problems. A Wikipedia article notes that
The Accelerated Reader's method for determining grade level is critically flawed. Lord of the Flies is considered 5th grade level and James Joyce's Ulysses is considered 7th grade level. There is more to a grade level of a book than the word count and length. Lord of the Flies and Ulysses score at very younger audience levels in the simplistic AR rubric, yet most 5th and 7th graders would not and probably could not read these books because of the story and narrative structure which is much more mature than the AR mathematical reduction of word count and length. 
This issue is acknowledged in an article written by the company that produces the Advanced Learning System, republished at a school web site.
Advances in technology and statistical analysis have led to improvements in the science of readability, but there are still some things that readability formulas cannot do—and will never be able to do. All readability formulas produce an estimate of a book’s difficulty based on selected variables in the text, but none analyzes the suitability of the content or the literary merit for individual readers. This decision is up to educators and parents, who know best what content is appropriate for each student. 
Submitted by Steve Simon
1. Is it wrong to assign more points to a Harry Potter book than a Jane Austen book?
2. Could the ATOS formula be adapted to give greater weight to the classics? How?
Earthquake probability maps
"Geology News - Earth Science Current Events" refers interested readers to a website created by the U.S. Geological Survey, which enables people to create earthquake probability maps for specific regions. ASCII files of raw data used in creating the maps are also available.
Conditional probability in search and rescue
“Coast Guard looks for lessons in Matagorda miracle”
by Jennifer Latson, Houston Chronicle, September 1, 2009
Three missing fishermen were found by a recreational yachtsman after 45 flights, 250 hours, 6 days, and 86,000 square miles of searching by the U.S. Coast Guard. The Coast Guard began the search on Saturday, August 22 and suspended it on Friday, August 28; the yachtsman found the fishermen on Saturday, August 29.
The Coast Guard's search and rescue system, built on probability statistical models, lists the average probability of being detected from a plane as 78 percent. …. [T] the crew made the difficult decision to suspend the search on Friday. A strict reading of the probability models might have called off the search as early as Wednesday. But searchers were hopeful, swayed by the pleas of relatives who said the men were skilled survivalists: outdoorsmen, fishers and hunters.
1. What do you think that the "average probability of being detected from a plane" refers to – one flight, one time period, one geographic region, one rescue mission, one person, one boat, …?
2. Suppose that the Coast Guard only measures the relative frequency of finding people or boats after it receives reports of alleged missing boaters. Can you see any benefit to including, in their calculations, simulations with people or boaters randomly placed in bodies of water?
3. Based on the Matagorda experience, would you recommend that the Coast Guard break its “average probability of being detected from a plane” into two conditional probabilities? What would those be?
4. Do you think that the Coast Guard could, would, or should have extended its search even more, if the probability of detecting the fishermen had been higher, based on the fact that the men were “skilled survivalists”?
Confidence in hurricane predictions
“NOAA: 2009 Atlantic Hurricane Season Outlook Update”
Issued August 6, 2009
Predictions for the 2009 Atlantic hurricane season were produced by the National Oceanic and Atmospheric Administration for the Atlantic hurricane region.
…. This combination of climate factors indicates a 50% chance of a near-normal hurricane season for 2009, and a 40% chance of a below normal season. An above-normal season is not likely ....
The outlook indicates a 70% probability for each of the following seasonal ranges: 7-11 named storms, 3-6 hurricanes, 1-2 major hurricanes ….
These predicted ranges have been observed in about 70% of past seasons having similar climate conditions to those expected this year. They do not represent the total range of activity seen in those past seasons.
1. How likely is an above-normal season?
2. Statistically speaking, would you call the 70% figure a probability? What would you have called it?
3. Based on the given seasonal range of named storms, can you estimate the (arithmetic) mean number of named storms and the standard error of the measurement?
Short-term probability in lawsuit
“... Filing of Class Action Lawsuit Against … ProShares Fund”
Reuters, August 31, 2009
A law firm filed a class action lawsuit in Maryland on behalf of shareholders in the UltraShort Financials ProShares Trust Fund (SKF). DJFI refers to the Dow Jones Financial Index. Here are excerpts from the claim:
… Defendants failed to disclose the following risks: (a) the mathematical probability that SKF's performance will fail to track the performance of the DJFI over any period longer than a single trading day; … (c) that SKF is not a directional play on the performance of U.S. financial stocks, but dependent on the volatility and path the DJFI takes over any time period greater than a single day; … (f) that based upon the mathematics of compounding, the volatility of the DJFI and probability theory [it can be inferred that] SKF was highly unlikely to achieve its stated investment objectives over time periods longer than a single trading day.
Consider part (f) of the complaint. Suppose that the probability of SKF’s performance tracking the DJFI over a single trading day was as high as, say, 75%.
1. How do you think that either the “mathematics of compounding” or “probability theory,” by themselves, would make it highly unlikely that SKF could track the DJFI over more than a single trading day, say 5 trading days? What assumption(s) are you making?
2. How do you think that including the “volatility of the DJFI” as a condition could affect your answers to the preceding question?
Conditional entropy in text analysis
“Decoding the Ancient Script of the Indus Valley”
by Ishaan Tharoor, TIME, September 1, 2009
The Indus Valley refers to a 300,000-square-mile region in modern-day Pakistan and northwestern India. Its urban culture is estimated to be at least 4500 years old.
The group examined hundreds of Harappan texts and tested their structure against other known languages using a computer program. Every language, they suggest, possesses what is known as "conditional entropy": the degree of randomness in a given sequence. In English, for example, the letter "t" can be found preceding a whole variety of other letters, but instances of "tx" or "tz" are far more infrequent than "th" or "ta." … Quantifying this principle through computer probability tests, they determined the Harappan script had a similar measure of conditional entropy to other writing systems, including English, Sanskrit and Sumerian. If it mathematically looked and acted like writing, they concluded, then surely it is writing.
Cable news polls a la The Daily Show
Jon Stewart, The Daily Show, August 17, 2009
Jon Stewart opened his August 17 show by commenting:
I believe that we are now more united than ever, and I have got mathematical proof to back me up. The more you watch cable news the more you see how unified Americans are.
Stewart showed three video clips from FOX News:
(a) “93% of you who know how to text say no, that we should not be talking to moderate factions of the Taliban.”
(b) “93% said yes,” the Republican Party is better off without Arlen Spector.
(c) “100% of you say yes, the town halls are making a difference.”
and four video clips from Lou Dobbs on CNN:
(d) “96% are outraged that big business and socio-ethnocentric special interest groups are trying to kill the most effective program in the fight against illegal immigration.”
(e) “97% of you say that it’s more important for the federal government to enforce our immigration laws than to count illegal aliens.”
(f) “94% of you say you are outraged that you are expected to tighten your belt.”
(g) “98% of you say it’s time illegal aliens said, ‘thank you’ for all the help and support they get in this country.”
He also showed two video clips from Ed Schultz’ polls on MSNBC.
These numbers seem a tad on the high side. Are they trustworthy? Is it possible these polls don’t reflect public opinion so much as the ability of those shows’ viewers to repeat the opinions they have just heard the host that they’re watching express?
At the end of the show, Stewart showed two contradictory poll results about paying for the increased cost of a reformed health care system:
(h) “92% said no, we don’t need a tax hike," on FOX.
(i) “94% said yes, it’s fair to tax the top 1.2% of the richest Americans," on MSNBC.
The FOX, CNN, and MSNBC program hosts asked viewers to text their responses to on-air questions, each of which had two possible answer choices.
1. Why is it important to see the exact questions and answer choices in each poll?
2. Result (c) was practically instantaneous after the question was posed. Can you suggest some reason(s) for the 100% response in this result?
3. Identify some problems with the method these programs used, in terms of providing reliable information about the attitudes of the American people in general, or even of a program’s viewers in particular?
4. Would it make sense to compute an estimate of the standard errors in these poll percentages? Why or why not?