Difference between revisions of "Chance News 55"
Simon66217 (talk  contribs) (→Netflix data mining contest) 
Simon66217 (talk  contribs) m (→Netflix data mining contest) 

Line 250:  Line 250:  
[http://bits.blogs.nytimes.com/2009/09/21/netflixawards1millionprizeandstartsanewcontest Netflix Awards $1 Million Prize and Starts a New Contest] Steve Lohr, Bits Blog, The New York Times, September 21, 2009.  [http://bits.blogs.nytimes.com/2009/09/21/netflixawards1millionprizeandstartsanewcontest Netflix Awards $1 Million Prize and Starts a New Contest] Steve Lohr, Bits Blog, The New York Times, September 21, 2009.  
−  Netflix just awarded a million dollar prize in a contest to build a data mining model that could predict what movies its customers would like to see  +  Netflix just awarded a million dollar prize in a contest to build a data mining model that could predict what movies its customers would like to see. There was a tight competition between two teams. The winning team, BellKor’s Pragmatic Chaos, was a group of seven "statisticians, machinelearning experts and computer engineers from the United States, Austria, Canada and Israel". The second place team, Ensemble, was "a global alliance with some 30 members." 
<blockquote>The losing team, as it turned out, precisely matched the performance of the winner, but submitted its entry 20 minutes later, just before the final deadline expired. Under contest rules, in the event of a tie, the first team past the post was the winner. “That 20 minutes was worth a million dollars,” Reed Hastings, chief executive of Netflix, said at a news conference in New York.</blockquote>  <blockquote>The losing team, as it turned out, precisely matched the performance of the winner, but submitted its entry 20 minutes later, just before the final deadline expired. Under contest rules, in the event of a tie, the first team past the post was the winner. “That 20 minutes was worth a million dollars,” Reed Hastings, chief executive of Netflix, said at a news conference in New York.</blockquote>  
Line 260:  Line 260:  
The losing team was not too upset.  The losing team was not too upset.  
−  <blockquote>Yet the scientists and engineers on the secondplace team, and the employers who gave many of them the time and freedom to compete in the contest, were hardly despairing. Arnab Gupta, chief executive of Opera Solutions, a consulting company that specializes in data analytics, based in New York, took a small group of his leading researchers off other work for two years.  +  <blockquote>Yet the scientists and engineers on the secondplace team, and the employers who gave many of them the time and freedom to compete in the contest, were hardly despairing. Arnab Gupta, chief executive of Opera Solutions, a consulting company that specializes in data analytics, based in New York, took a small group of his leading researchers off other work for two years. "We've already had a $10 million payoff internally from what we’ve learned," Mr. Gupta said. Working on the contest helped the researchers come up with improved statistical analysis and predictive modeling techniques that his firm has used with clients in fields like marketing, retailing and finance, he said. "So for us, the $1 million prize was secondary, almost trivial."</blockquote> 
The data set itself was the true prize to many team members.  The data set itself was the true prize to many team members. 
Revision as of 16:43, 25 September 2009
Contents
 1 Quotations
 2 Forsooths
 3 Breaking News
 4 Amazon River at age 1,000,003 years
 5 Gompertz Law of human mortality
 6 Things that go bump
 7 The Bulgarian Toto 6 of 42 lottery
 8 Baby, it’s cold outside
 9 U.S. Census: 2008 sampling results released
 10 Tennis challenges are underused
 11 Netflix data mining contest
Quotations
Populism, in its latest manifestation, celebrates ignorant opinion and undifferentiated rage. .... The typical opinion poll … doesn’t trouble to ask whether the respondent knows the first thing about the topic being opined upon, and no conventional poll disqualifies an answer on the ground of mere total ignorance. The premise of opinion polling is that people are, and of right ought to be, omniopinionated – that they should have views on all subjects at all times – and that all such views are equally valid. …. So, given the prominence of polls in our political culture, it’s no surprise that people have come to believe that their opinions on the issues of the day need not be fettered by either facts or reflection. …. Now there’s the intellectual free lunch: I’m entitled to vociferous opinions on any subject, without having to know, or even think, about it.
We live in a world of real dangers and imagined fears. …. We are hounded by what I call “psychofacts”: beliefs that, though not supported by hard evidence, are taken as real because their constant repetition changes the way we experience life. …. We act as if there’s a constitutional right to immortality and that anything that raises risk should be outlawed. ….
In a September 18 Statesman Journal story, “Ducks’ defense faces tough challenge”, a coach was quoted:
The only statistic that counts is winning and losing …. We don't get caught up in that. .... How many yards and those things.
A retiring associate professor of math at BVU was described in a Storm Lake Pilot Tribune article [1] of September 17:
His love for math outweighed his love of sports by a few percentage points.
The plain fact is that 70 years ago Ronald Fisher gave scientists
a mathematical machine for turning baloney into breakthroughs,
and flukes into funding. It is time to pull the plug.
Robert Mathews commenting on medical studies which have a low pvalue and thus are statistically significant but subsequently turn out to be duds when expanded to the general population.
Submitted by Paul Alper.
Forsooths
Responding to a Canadian viewer who pointed out that "life expectancy in Canada under our health system is higher than the USA," Fox's Bill O'Reilly on 7/27/09 said,
Well, that's to be expected, Peter, because we have 10 times as many people as you do. That translates to 10 times as many accidents, crimes, down the line.
According to a September 18 FOX8 News WVUETV story, “Chance for rain”, the following information was published in a cover story in an early 2009 bulletin of the American Meteorological Society:
[Researchers at the University of Washington] found people in Seattle didn't have much of a grasp for what the probability forecast [of rain] really means, but found the numbers helpful in planning their day.
Hanna Karp, in “What’s the Point of Cheerleading?”, The Wall Street Journal, September 17, 2009, states:
Riskassessment experts say it’s hard to get a handle on the perils of cheerleading.
An advertisement in The Wall Street Journal, of September 22, 2009, contained a chart with an interesting legend. See the chart “Effectiveness of virtual vs. inperson meetings,” in “The Return on Investment of U.S. Business Travel”, prepared by Oxford Economics USA, September 2009, document page 21/pdf page 20.
Students might find it challenging to describe in one sentence what it says. They also might be asked to recreate the chart so that it would convey the message more effectively, that is, pass the interocular trauma test.
Breaking News
The Wall Street Journal of September 8, 2009 reports on a study in the Journal of Bone and Joint Surgery: “The researchers compared the outcomes of patients who underwent surgery between 6 a.m. and 4 p.m. for fractures of the femur or tibia to those who had comparable surgeries for similar fractures outside those normal hours.”
Sample  Reoperations Needed 
Sample Size 
Sample Proportion 

Outside Normal Hours  28 
82 
.3415  
Within Normal Hours  12 
70 
.1714 
The results are:
Difference = p (1)  p (2) Estimate for difference: 0.170035 95% CI for difference: (0.0346494, 0.305420) Test for difference = 0 (vs not = 0): Z = 2.37 PValue = 0.018
Fisher's exact test: PValue = 0.026
Discussion
1. Why is the Fisher exact test PValue (0.026) to be preferred to the other PValue mentioned (0.018)?
2. The Wall Street Journal mentioned several caveats “making it difficult to determine the underlying reasons for the afterhours patients’ poor outcomes.” List a few practical significance hedges to the statistically significant result.
Submitted by Paul Alper.
Amazon River at age 1,000,003 years
“Metrics mania: Are Americans too reliant on numbers?”
by John Yemma, The Christian Science Monitor, September 16, 2009
The author first reminds readers of an old joke:
A guy strikes up a conversation with another guy on a long plane flight to South America. They are over the Amazon.
Guy 1: “Did you know that the Amazon is 1,000,003 years old?”
Guy 2: “Really? How can you be so precise?”
Guy 1: “I was on this same flight three years ago, and a geologist told me the Amazon was a million years old.”
He then discusses the difficulty with “metricsbased management” efforts, but concludes, in a hopeful vein, with a formula and some encouragement:
Metrics + Grain of Salt = Somewhat Useful Information.
Still, even if we can’t trust data absolutely, we can extract meaning. We may not know how old the Amazon really is, but we know one thing for certain: It is three years older than when Guy 1 first flew over it.
A blogger comments [2],
So true. I am an European who has lived in the US for almost 20 years. I am constantly amazed at the ‘number obsession’ that seems to rule all areas of society. It may be because this country is so big, that a common measure can only be found in quantities, not qualities.
Gompertz Law of human mortality
“You’re Likely to Live!”
by “Freakonomics,” The New York Times, September 14, 2009
This very brief article describes the “Gompertz Law of human mortality,” provides some statistics about the different chances of dying at different ages, and refers readers to three websites:
(a) Article with Gompertz Law details and graphs: “Your body wasn’t built to last: a lesson from human mortality rates”, "gravity and levity" blog, July 8, 2009.
(b) Applet that gives life expectancy at userselected age: “Death Probability Calculator”, undated.
(c) TED video of songs, the first of which relates to aging: “Time is marching on”, March 2007.
Things that go bump
“Bumped Passengers Learn a Cruel Flying Lesson”
by Scott McCartney, The Wall Street Journal, September 17, 2009
This article discusses the recent spike in the rates of passengerbumping by airlines, despite the increased penalties that the federal government requires the airlines to pay bumpedbutticketed passengers. Although bumping affects fewer than 2 passengers out of every 10,000, that rate rose by 40% in the second quarter of 2009 over the rate for the second quarter of 2008.
It's pretty simple: It's just because planes are more full than last year," says [a US Airways official, whose airline] had the highest bumping rate among major airlines, at 1.88 passengers per 10,000 in the second quarter.
This summer, the nine major airlines filled 85.5% of their seats, up from 84.1% last summer. The peak was July, with 86.7% of seats filled.
Federal rules allow airlines to overbook in order to compensate for noshows. The recent increase in bumping rates may be explained by the reduced demand for air travel, especially by business customers.
The [Department of Transportation] says it isn't concerned about the rise in bumping because the rates are still lower than historical highs. During the 1970s and 1980s, bumping rates were routinely four times as high as today's rate.
Discussion
Suppose that, on average, 85% of ticketholders show up for their flights. Assume that the distribution of the number of ticketholders who show up is binomial (especially that every ticketholder has the same chance of being bumped) and that a ticketholder is bumped only due to lack of a seat.
1. For each n tickets sold, or oversold, for a 200seat plane, find the number of ticketholders an airline could expect to show up, on average.
(a) n = 200 (b) n = 210 (c) n = 220 (d) n = 230 (e) n = 240 (f) n = 250.
2. It appears that the airline would not have to bump any ticketholders for some values of n. Is that a statistically correct inference, based on your understanding of expected value? Even if those expected values always “came true,” what problem would remain for the airline?
3. For each n tickets sold, or oversold, find the probability of at least one ticketholder being bumped off the 200seat plane.
(a) n = 200 (b) n = 210 (c) n = 220 (d) n = 230 (e) n = 240 (f) n = 250.
4. For which value(s) of n would you have a negligible risk of being bumped? Under what circumstances might any risk be too great?
5. The more tickets an airline sells, the more likely it is to fill the plane and thus maximize its revenue for a flight. However, at some point, the increased revenue may be offset by losses of future dollars from angry ticketholders and compensation payouts to increasing numbers of bumped ticketholders. What other information would you want/need to know before deciding how many tickets to sell for a 200seat plane?
6. Do you agree with the "pretty simple" reason given for the increased rate of bumping?
The Bulgarian Toto 6 of 42 lottery
The Bulgarian Toto 6 of 42 lottery was the subject of an investigation after the same set of six numbers {4, 15, 23, 24, 35, 42} was drawn in two successive lotteries on September 6 and September 10, 2009. The article [3] cites a mathematician as stating that the probability of picking the same six numbers twice in a row is 4,200,000:1. We wondered how he arrived at this number. What is the probability that a specified set of six numbers will repeat consecutively?
There are <math>{42 \choose 6} = 5245786</math> different sets of six numbers and the probability that a SPECIFIED set will occur in the next two consecutive draws is <math>1/5245786^2</math>. Because the sets involve disjoint events, the probability that SOME set will occur in the next two consecutive draws is <math>5245786 \times 1/5245786^2 = 1/5245786</math>.
But now, suppose the lottery has been running continuously for <math>m</math> draws and we ask what the chance is that during this period there were consecutive draws of the same set. As before, first consider a fixed set of six numbers.
There are <math>m1</math> opportunities for this set to be drawn twice in succession (beginning with the second drawing). The probability that this will happen is then the probability of the union <math>P(A) = P(\cup_i A_i A_{i+1}) </math> where <math>A_i</math> is the event that this set of numbers is drawn on the ith draw.
Bonferroni's first degree upper bound is <math>P(A) \le \sum_i P(A_i A_{i+1})</math> while the second degree lower bound is <math>P(A) \ge \sum_i P(A_i A_{i+1})  \sum_{1 \le i < j \le m}P(A_i A_{i+1} A_j A_{j+1}).</math>
We assume (!) that the events <math>A_i</math> are independent and identically distributed with probability <math>p = 1/5245786</math>. As long as <math>mp</math> is small the second sum in the lower bound can be ignored, giving <math>P(A) \approx (m1)/5245786^2.</math>
It appears that the draws are held twice per week so for one year <math>m = 104</math> giving the probability <math>3.74 \times 10^{12}</math> that a specified set of numbers will be drawn twice in succession. According to a spokeswoman the lottery has been taking place for 52 years [4]. Using <math>m = 104 \times 52 = 5408</math>, the probability that a specified set of numbers will be drawn twice in succession over this period is <math>1.89 \times 10^{10}</math>, still very small.
But now let's ask the question, not for a fixed set of numbers but for some set of numbers. After all, in discussing this coincidence the repeated set arises by chance alone and is not specified in advance.
In <math>m</math> drawings what is the probability that SOME set of six numbers will be repeated in consecutive draws.
There are 5245786 possible sets of numbers that could be repeated. Enumerate the sets by integers <math>1 \le k ≤ \le 5245786</math> with <math>E_k</math> the event that set <math>k</math> repeats consecutively sometime during these <math>m</math> drawings. The probability of the union <math>P(\cup E_k)</math> is needed. Each of the 5245786 events <math>E_k</math> has probability <math>(m1)/ 5245786^2</math> and if they were independent we could evaluate the probability using complements as <math>P(\cup E_k) = 1  (1 (m1)/5245786^2)^{5245786} \approx 1  e^{(m1)/5245786}</math>. However, they are dependent, but as long as <math>mp</math> is small Bonferroni's bounds can once again be used to estimate <math>P(\cup E_k) \approx (m1)/5245786.</math> For <math>m = 5408</math> this is 0.0010302. (Note that assuming independence gives 0.0010307.)
This probability relates to one lottery. Suppose we consider all lotteries worldwise and ask for the probability that in some lottery, somewhere, some set of numbers will be repeated consecutively. All lotteries are variants of Toto with different numbers involved. Each lottery will have had its own cumulative number of drawings. In order to gauge the magnitude of the probability wanted, assume that there are <math>x</math> lotteries, each one sharing the same numerical characteristics as the Bulgarian one.
This time we can use independence. The probability that some set will be repeated is 1 minus the probability that in no lottery is a set of numbers selected on two consecutive drawings <math>= 1  (1  (m 1)/5245786)^x</math>. For <math>x = 50</math> this is 0.0503 while for <math>x = 100</math> the probability is 0.0980. (An approximation to one significant digit for this range of values of interest is <math>x(m1)/5245786.</math>)
For a different problem that discusses "very big numbers" see the article about double lottery winners [5].
Questions.
1. Can you verify both assertions concerning the Bonferroni lower bound.
2. How many years would the Bulgarian lottery need to be running in order to have the same probability that some set of numbers will appear three times in succession?
3. Instead of demanding that the same set of numbers appear twice in succession, what is the probability that some set of numbers will repeat during <math>m</math> drawings? (This is simpler and is the famous birthday problem)
Submitted by Fred Hoppe
Baby, it’s cold outside
“New Light on the Plight of Winter Babies”
by Justin Lahart, The Wall Street Journal, September 22, 2009
Two Notre Dame economists “may have uncovered an overlooked explanation for why season of birth matters” with respect to the often reported poor test results, less healthiness, reduced longevity, and lower school completion rates and earnings of children born in the winter. See “Season of Birth and Later Outcomes: Old Questions, New Answers”, by Kasey Buckles and Daniel Hungerman, National Bureau of Economic Research, December 2008.
Working independently, Hungerman found that “children in the same families tend to be born at the same time of year,” and Buckles found a “tendency that less educated mothers were having children in winter.” They put their heads together and concluded that:
A key assumption of much of [the previous] research is that the backgrounds of children born in the winter are the same as the backgrounds of children born at other times of the year.
Some previous explanations for seasonal birth differences were school attendance laws, the amount of sunshine available in a season, or the level of pesticides in the water in a season. With respect to the first explanation, economists Joshua Angrist of MIT and Alan Krueger of Princeton posited in 1991 that, since winter babies can drop out of school earlier because they reach their 16th birthdays earlier, those babies have lower education levels that, in turn, lead to lower earnings.
Upon examination of CDC birthcertificate data for virtually all 52 million children born during the period 19892001, the Notre Dame researchers noted:
The percentage of children born to unwed mothers, teenage mothers and mothers who hadn't completed high school kept peaking in January every year. Over the 13year period, for example, 13.2% of January births were to teen mothers, compared with 12% in May  a small but statistically significant difference, they say.
A Columbia University economist comments about how striking the Notre Dame results are: "You can take a look at those graphs and see the clear pattern and that it's remarkably stable over time." See graphs [6] of January and May births with respect to birth mother’s marital status, age, and education.
Angrist disagrees, stating "The bottom line is a slight change in the estimate. …. It hardly overturns our finding."
Buckles and Hungerman are now working on finding an explanation of why a mother’s socioeconomic status is related to a child’s birth month.
(As of September 24, there were 298 blogs [7] responding to this article!)
U.S. Census: 2008 sampling results released
The U.S. Census Bureau has released the 2008 results of its ongoing "American Community Survey".
Tennis challenges are underused
Challenge, Anyone? Paul Kedrosky, The New York Times, September 20, 2009
It seems like economists have an opinion about just about everything. Paul Kedrosky, described in the New York Times as "senior fellow at the Kauffman Foundation, a center for economic research," has advice for professional tennis players. Challenge the line judges more often.
Like in American football, tennis players can challenge a judges ruling.
Here’s how challenges work. Major tennis tournaments (like the U.S. Open) have multiple cameras arrayed around the court. This permits a simulated replay of a shot, showing to the millimeter where a ball landed on the court. So instead of futilely shouting, "You can't be serious!" at linesmen and umpires, players can raise their hand immediately after a call and ask for a replay.
There are limits to how often you can challenge.
The rules allow three incorrect challenges per player per set. In a bestoffivesets match (which is normal for men), that means at least 18 available challenges per match, none of which carry over from set to set.In other words, use ’em or lose ’em. A player can get an additional challenge if the match goes into a tiebreaker, or if a fifth set goes overtime.
But most tennis players don't come even close to using all of their available challenges. And they should according to Kedrosky. Here's one scenario he proposes:
For example, the No. 10 seed at the Open, Fernando Verdasco of Spain, averaged 0.4 challenges per set and had a sparkling 43 percent success rate. If he challenged once per set, like Federer, and his challenge success rate fell to a similar 30 percent, it could mean one more point to him in a threeset match. If his success rate didn’t fall as much, however, and he challenged twice per set it might mean as many as three more points in a fiveset match. Either way, it could be the difference between winning and losing.
There are other factors at work, such as embarrassment when a challenge does not go your way, but Kedrosky thinks players should ignore this.
Submitted by Steve Simon
Questions
1. Suppose you wanted to run a computer simulation about close calls in tennis and vary the rate at which players challenge close calls. What are some of the rnadom variables that you would have to account for in such a simulation?
2. How much of an edge would one more point mean in a match between two players who are otherwise evenly matched?
Netflix data mining contest
A $1 Million Research Bargain for Netflix, and Maybe a Model for Others Steve Lohr, The New York Times, September 21, 2009.
Netflix Awards $1 Million Prize and Starts a New Contest Steve Lohr, Bits Blog, The New York Times, September 21, 2009.
Netflix just awarded a million dollar prize in a contest to build a data mining model that could predict what movies its customers would like to see. There was a tight competition between two teams. The winning team, BellKor’s Pragmatic Chaos, was a group of seven "statisticians, machinelearning experts and computer engineers from the United States, Austria, Canada and Israel". The second place team, Ensemble, was "a global alliance with some 30 members."
The losing team, as it turned out, precisely matched the performance of the winner, but submitted its entry 20 minutes later, just before the final deadline expired. Under contest rules, in the event of a tie, the first team past the post was the winner. “That 20 minutes was worth a million dollars,” Reed Hastings, chief executive of Netflix, said at a news conference in New York.
Netflix has already improved its system to help customers pick movies and thinks the million dollar investment was well spent.
Thousands of teams from more than 100 nations competed in the Netflix prize contest. And it was a good deal for Netflix. “You look at the cumulative hours and you’re getting Ph.D.’s for a dollar an hour,” Mr. Hastings said in an interview.
The losing team was not too upset.
Yet the scientists and engineers on the secondplace team, and the employers who gave many of them the time and freedom to compete in the contest, were hardly despairing. Arnab Gupta, chief executive of Opera Solutions, a consulting company that specializes in data analytics, based in New York, took a small group of his leading researchers off other work for two years. "We've already had a $10 million payoff internally from what we’ve learned," Mr. Gupta said. Working on the contest helped the researchers come up with improved statistical analysis and predictive modeling techniques that his firm has used with clients in fields like marketing, retailing and finance, he said. "So for us, the $1 million prize was secondary, almost trivial."
The data set itself was the true prize to many team members.
Win or lose, researchers agreed that they entered the contest in good part to get access to the Netflix data. "It was incredibly alluring to work on such a large, highquality data set," said Joe Sill, an independent consultant and machinelearning expert who was a member of the Ensemble.
Submitted by Steve Simon