# Difference between revisions of "Chance News 35"

(→Joe DiMaggio's Streak) |
(→Joe DiMaggio's Streak) |
||

Line 257: | Line 257: | ||

<blockquote> In a fit of scientific skepticism, we decided to calculate how unlikely Joltin'Joe's achievement really was. Using a comprehensive collection of baseball statistics from 1871 to 2005, we simulated the entire history of baseball 10,000 times in a computer. In essence, we programmed the computer to construct an enormous set of parallel baseball universes, all with the same players but subject to the vagaries of chance in each one.<br><br> | <blockquote> In a fit of scientific skepticism, we decided to calculate how unlikely Joltin'Joe's achievement really was. Using a comprehensive collection of baseball statistics from 1871 to 2005, we simulated the entire history of baseball 10,000 times in a computer. In essence, we programmed the computer to construct an enormous set of parallel baseball universes, all with the same players but subject to the vagaries of chance in each one.<br><br> | ||

− | Here’s how it works. Think of baseball players’ performances at bat as being like coin tosses. Hitting streaks are like runs of many heads in a row. Suppose a hypothetical player named Joe Coin had a 50-50 chance of getting at least one hit per game, and suppose that he played 154 games during the 1941 season. We could learn something about Coin’s chances of having a 56-game hitting streak in 1941 by flipping a real coin 154 times, recording the series of heads and tails, and observing what his longest streak of heads happened to be. | + | Here’s how it works. Think of baseball players’ performances at bat as being like coin tosses. Hitting streaks are like runs of many heads in a row. Suppose a hypothetical player named Joe Coin had a 50-50 chance of getting at least one hit per game, and suppose that he played 154 games during the 1941 season. We could learn something about Coin’s chances of having a 56-game hitting streak in 1941 by flipping a real coin 154 times, recording the series of heads and tails, and observing what his longest streak of heads happened to be.<br> |

Our simulations did something very much like this, except instead of a coin, we used random numbers generated by a computer. Also, instead of assuming that a player has a 50 percent chance of hitting successfully in each game, we used baseball statistics to calculate each player’s odds, as determined by his actual batting performance in a given year.</blockquote> | Our simulations did something very much like this, except instead of a coin, we used random numbers generated by a computer. Also, instead of assuming that a player has a 50 percent chance of hitting successfully in each game, we used baseball statistics to calculate each player’s odds, as determined by his actual batting performance in a given year.</blockquote> | ||

## Revision as of 14:36, 24 April 2008

## Contents

- 1 Forsooth
- 2 In this lottery it was better to win third than second place
- 3 Intuitive and non-intuitive medical news
- 4 Discussion
- 5 Vytorin is in the news again
- 6 Discussion
- 7 Blood Pressure
- 8 Blood Pressure Again
- 9 The Deathless Monty Hall Problem!
- 10 Monkeys and Monty Hall
- 11 Joe DiMaggio's Streak

## Forsooth

This Forsooth is from the April 2008 issue of the RSS News.

Probability - the chance, or likelihood, of a certain/particular event occurring which can be expressed as a quantitative description, often ranging from 0 (rare event) to 1 (common event.Learning lessons from the 2007 floods:

an independent review by Sir Michael Pitt

Cabinet Office, UK

December 2007

USA Today has come out with a new survey: Apparently three out of four people make up 75 percent of the population.David Letterman,

04/12/1947

Paul Alper suggested this as a bad English forsooth:

VYTORIN is a medicine used to lower levels of LDL (bad) cholesterol in addition to a healthy diet.Vytorin ad

Star Tribune sports pages

April 4, 2008

Paul also suggested this Forsooth:

Having a husband creates an extra seven hours a week of housework for women, according to a University of Michigan study of a nationally representative sample of U.S. families. For men, the picture is very different: A wife saves men from about an hour of housework a week.ScienceDaily

April 3, 2008

Our first item was suggested by Fred Hoppe at MacMaster University who's research is in probability and statistics with a hobby of lottery problems.

## In this lottery it was better to win third than second place

The Lotto 6/49 in Ontario Canada asks you to choose six numbers from 1 to 49 on up to 10 boards (each board costs $2) or ask for a Quick Pick and the lottery terminal will randomly select your numbers. The Lotto officials randomly draw 6 numbers from 1 to 49 and a bonus number from 1 to 49. The payoffs are

47% of LOTTO 6/49 draw sales is dedicated to the Prize Fund. The total amount of $5 and $10 prizes are paid from the Prize Fund and the balance of the fund (the Pools Fund) is then allocated between the 4/6, 5/6, 5/6 + Bonus and 6/6 prize categories as indicated in the table above. Any amount not won in the 6/6 or 5/6 + Bonus prize categories is added to the 6/6 Jackpot prize for the next draw. Here is how this came out for the March 19 Lotto 6/49

Fred writes:

In the March 19, 2008 Lotto 6/49 numbers drawn were 23 - 40 - 41 - 42 - 44 - 45 and the bonus was 43.

Can you imagine the consternation of the poor folks who, against the odds, matched 5/6 numbers and the bonus number, then found their excitement turned to dismay upon learning their share was only $1,193.70 because of the 239 who matched likewise. The third place winners (match 5/6 only) each took home $2,223.40.

## Discussion

It would be interesting to estimate the probability that there will be more third place winners than second place winners assuming quickpicks. Can you see how this might be done?

jls

## Intuitive and non-intuitive medical news

"The study of more than 6,000 people found the more fat they had in their guts in their early to mid-40s the greater their chances of becoming forgetful or confused or showing other signs of senility as they aged. Those who had the most impressive midsections faced more than twice the risk of the leanest." So says Rob Stein of The Washington Post on March 26, 2008. He is referring to the publication "Central obesity and increased risk of dementia more than three decades later" published on March 26, 2008 in Neurology. Even if an individual is not obese or even overweight, "A large belly independent of total weight is a potent predictor of dementia," comments the lead author of the study. According to Stein, "The researchers used a complicated method for measuring fat known as sagittal abdominal diameter (SAD). Those with a SAD score above 25 had the biggest bellies and the greatest risk. That is roughly equivalent to a waist of at least 39 inches. Of the 6583 participants who had their SAD measured between 1964 and 1973, 1059 (15.9%) were eventually diagnosed with dementia.

Nevertheless, claims a Dutch study ("Lifetime Medical Costs of Obesity: Prevention No Cure for Increasing Health Expenditure," Public Library of Science, ) preventing obesity in general and presumably SAD as well, turns out to be expensive because it costs more to care for healthy people who live years longer, long enough to contract diseases which are truly high-priced. The lead researcher says "Lung cancer is a cheap disease to treat because people don't survive very long. But if they are old enough to get Alzheimer's one day, they may survive longer and cost more." In February, 2008, the AP review of the article claims looking at a lifetime, "the thin and healthy group cost the most, about $417,000" and the "cost of care for obese people was $371,000, and for smokers, about $326,000."

Turning now to a much smaller study and much smaller time period for measurements, Josephine Marcotty in the Minneapolis Start Tribune of March 27, 2008, writes about a before and after investigation pertaining to the recent Minnesota statewide smoking ban. "Dr. Dorothy Hatsukami, who heads the university's tobacco research center, recruited 24 nonsmokers from around the state who worked at bars, restaurants and bowling alleys that permitted smoking. Before the ban went into effect, she tested their urine for nicotine and a carcinogen. Then she tested them again after the ban. The study will be submitted for publication in the journal Cancer Epidemiology, Biomarkers & Prevention." Marcotty writes, "On average, the levels of nicotine and the carcinogen dropped by more than 80 percent."

Speaking of lung cancer, Gardiner Harris in the New York Times of March 26, 2008 wrote, "In October 2006, Dr. Claudia Henschke of Weill Cornell Medical College jolted the cancer world with a study saying that 80 percent of lung cancer deaths could be prevented through widespread use of CT scans." She stated that "after screening 31,567 people from seven countries, CT scans uncovered 484 lung cancers, 412 of them at a very early stage. Three years later, most of those patients were still alive, and she projected that 80 percent would be alive after 10 years and assumed that they would have died without the screens." The publication was in the prestigious New England Journal of Medicine which failed to note Henschke's research "was underwritten almost entirely by $3.6 million in grants from the parent company of the Liggett Group, maker of Liggett Select, Eve, Grand Prix, Quest and Pyramid cigarette brands." As the chief medical officer of the American Cancer Society put it, "If you're using blood money, you need to tell people you're using blood money." A former editor of the New England Journal of Medicine reasoned that the tobacco companies "want to show that lung cancer is not so bad as everybody thinks because screening can save people; and that's outrageous." Furthermore, it was "recently reported that" Dr. Henschke, the biggest advocate of screening for lung cancer, "had failed to disclose in articles and educational lectures a patent and 10 pending patents related to CT screening and follow-up."

## Discussion

1. The SAD measurement was one of a number of measurements made on each participant. What statistical problem does this introduce?

2. The SAD study started with 8664 participants but the final sample size was only 6583. Why might that be an issue?

3. Presumably the Dutch study of obesity dealt with Dutch data concerning costs. Why might things be different when dealing with the United States?

4. The Minnesota study dealt with the effect of passive smoking and was funded by ClearWay, an advocacy group for banning smoking. How is ClearWay providing funds different from/similar to the Liggett group providing funds?

5. CT scans of the lung, just like mammograms which are commonly used to detect breast cancer, may have false positives. To distinguish a true positive from a false positive, biopsies and/or surgery are often done afterward. What makes the follow-up procedure in the former riskier than in the latter?

Submitted by Paul Alper

## Vytorin is in the news again

History plays out in strange ways. In mid January, 2008, Vytorin's failure in the ENHANCE study to reduce plaque and heart attacks (over Zocor by itself) hit the news to the point that a commentary appeared even in Chance News 33 "Cholesterol Significance". Then suddenly, because of a medical conference in Chicago at the end of March, the media found Vytorin again even though there was nothing new to report during the two-month interval.

An AP dispatch by Marilynn Marchionne on March 30, 2008 reported that "The results show the drug had "no result - zilch. In no subgroup, in no segment, was there any added benefit" for reducing plaque, said Dr. John Kastelein, the Dutch scientist who led the study. Kastelein said the data were far more consistent than anticipated and ample to show that the drug simply did not work." An interesting observation from another leading cardiologist lends credence to the discipline of statistics: "but the reason we do research is so we don't have to rely on our 'beliefs' - we can rely on data." In this instance, the belief is that combining two drugs which have different mechanisms ought intuitively to be synergistic. The clinical trial proved otherwise.

## Discussion

1. An editorial in the Minneapolis Star Tribune, criticizes the "$200 million ad campaign" of the manufacturers of Vytorin. The Minneapolis Star Tribune ran a full two-page ad for four days in January and a similar full two-page ad at the end of March for two days. Determine how much the newspaper received for those ads.

2. Vytorin did reduce LDL in a statistically significant manner but the criticism has to do with practical significance because LDL is an inexact marker or indicator of future heart problems. Name some other inexact markers in medicine. Name some other inexact markers for education, piety, and basketball success.

3. From here we read "The "American College of Cardiology recommends that major clinical decisions not be made on the basis of the ENHANCE study alone." From here we read "since 2003 the ACC has received nearly $5 million from Merck, $1 million from Schering-Plough and more than $5 million from the companies' joint venture that sells cholesterol drugs Vytorin and Zetia." Defend and criticize a financial relationship between the pharmaceutical industry and the medical profession.

Submitted by Paul Alper

## Blood Pressure

Not everyone has high blood pressure but from scanning the news it would seem that way. From the AP. we find that in a clinical, randomized trial of "11,462 people in the United States and Nordic countries" a combination of an ACE-calcium blocker plus benazepril ( an ACE inhibitor) did better than a diuretic plus benazepril. The study was "stopped early so the surprising benefits could be made known."

## Discussion

1. There were 531 "heart-related problems or strokes" in the 5,721 people who had the combination of an ACE-calcium blocker plus benazepril whereas there were "653 events among the 5,741 others. Perform a two-sample test for difference in proportions to demonstrate that the results are "statistically significant."

2. "The study was paid for by Novartis, which sells Lotrel, the combo that proved better and [Kenneth] Jamerson consults for the company." In addition, the study does not as yet appear in a peer-reviewed journal but only at a conference of cardiologists in March, 2008. How does this information affect your attitude towards the conclusion?

3. Determine the monthly cost for Lotrel under various health plans and how it compares with a diuretic plus benazepril.

## Blood Pressure Again

High blood pressure in those over 80 years of age is of especial concern. From the NHS referring to an article in the NEJM of March 31, 2008, we find that "3,845 people who were eligible were randomly assigned to receive either the diuretic indapamide (1.5mg sustained release pills) or an inactive placebo." There were 1933 patients in the active or treatment arm and 1912 in the placebo arm. The study was stopped in July, 2007 when it was seen the active group had 39% fewer strokes and 21% fewer deaths from any cause.

A quick summary of some of the results is as follows:

Active Group Placebo Group P ValueFatal or nonfatal stroke 51 69 .06Death from stroke 27 42 .046Adverse events 358 448 .001Death, any cause 196 235 .02## Discussion

1. "The study was funded by the British Heart Foundation and the Institut de Recherches Internationales Servier." Why is this better than being funded by the makers of indapamide?

2. The study appears in the NEJM. Why is this better than being presented only at a conference?

3. The study took place at "195 centres in Europe, China, Australasia and Tunisia" over a four-year period. Thus, it is important to take into account that events do not take place uniformly because of deaths, dropouts, refusals, closing of centers by the data monitoring committee, etc. Ignoring any lack of uniformity and considering what might be called "a final snapshot," then a simple two-sample t-test for "Death, any cause" would yield

Test and CI for Two Proportions

Sample X N Sample p 1 236 1912 0.123431 2 196 1933 0.101397Difference = p (1) - p (2)

Estimate for difference: 0.0220342

95% CI for difference: (0.00207287, 0.0419955)

Test for difference = 0 (vs not = 0): Z = 2.16 P-Value = 0.031

Fisher's exact test: P-Value = 0.032

which is fairly close to the P Value of .02 reported in the study. Carry out the same calculation for the other three rows in the table and compare results of the table with this snapshot approach.

4. Notice that "Fatal or nonfatal stroke" "did not quite reach statistical significance" but was still deemed important. Translate that sentence into English.

5. One caveat of this study is that the patients in this sample were healthier than normal for their age. Why is this a caveat?

## The Deathless Monty Hall Problem!

And behind door No. 1, a fatal flaw.

New York Times, 8 April 2008

John TierneyThis article describes an application of the infamous Monty Hall Problem to an experiment in cognitive psychology.

As a supplement, the The New York Times web site has posted an interactive simulation of the game show . Certainly not the first of its kind, but it has nice graphics, along with an intelligible, nontechnical discussion of why it makes sense to trade doors.

Submitted by Bill Peterson, based on a suggestion from Priscilla Bremser.

Here is more about this from our anonymous contributor.

## Monkeys and Monty Hall

Even casual readers of Chance News are familiar with the original Monty Hall problem: three doors, two goats, one car and an omniscient host (Monty Hall). A quick check with Google reveals the connection with Marilyn vos Savant’s correct solution: switch to the remaining door, (because the other door has a probability of 2/3, not ½, of having the car behind it) despite some mathematicians insisting she was wrong.

http://www.dartmouth.edu/~chance/forwiki/MontyHall3.jpg Your win! You got the fancy car (Or at least a picture of one) John Tierney of the New York Times has written several interesting columns in April, 2008 dealing with some other aspects of problem. According to M. Keith Chen of Yale University, the Monty Hall Problem may explain in a purely mathematical/statistical way how cognitive dissonance purports to turn up in psychology experiment. The diagram below illustrates the connection.

The Monty Hall Problem

http://www.dartmouth.edu/~chance/forwiki/MontyHall1.jpg Tierney also has Craig Fox of UCLA present another variant of the Monty Hall problem: “Suppose that Andy (A), Ben (B), and Chris (C) are three men selected at random from the telephone book. If you learn that Andy is taller than Chris, then what is the probability that Andy is the tallest of the three?” Incorrect reasoning leads to: “Learning that Andy is taller than Chris eliminates Chris from consideration, apparently leaving two remaining possibilities that seem equally likely: either Andy is tallest or Ben is tallest.” In fact, “the correct answer is 2/3” because “learning that Andy is taller than Chris” yields three equally likely possibilities: {ABC, ACB, BAC} because the other possibilities {BCA, CAB, CBA} no longer exist.

Fox devised another version which is worth quoting in full: “Jonathan [Levav of Columbia University] shuffled five cards that included one ace, then dealt two cards to the participant and three to himself. He told participants that if their hand contained an ace at the end of the game then they would receive $1. Jonathan told one group of participants that he would first look at his cards then offer them a chance to switch hands. Most of these participants correctly saw that there was a 3/5 chance that the ace was in Jonathan's hand, and most opted to switch hands. Jonathan told a second group of participants that he would look at his cards then turn up two cards that are not the ace, then offer them a chance to switch hands. Note that no matter how he dealt the cards, Jonathan could always find two available non-ace cards, and turning them up does not change the probability that the ace was dealt to Jonathan's hand. Nevertheless, participants couldn't help but edit out those possibilities and treat the remaining cards as equally likely to be the ace. Thus, most said the chances that Jonathan's hand contained the ace were now 1/3 and very few wanted to switch hands with him.”

Links to the work of Keith Chen, John Tierney and additional readings can be found on the New York Times article.

## Discussion

1. There is an instance in the history of statistics which is analogous to explaining cognitive dissonance, a supposed physical phenomenon, as a purely statistical happenstance. Francis Galton plotted height of offspring vs. (mid) height of parents and found that the kids of parents of above-average height were on average not quite so tall. Likewise, kids of parents of below-average height were on average not quite so short. This led him to assert that natural selection would never lead to improvement of the human race. He interchanged the x and y axes and found to his surprise that tall kids begot on average not quite so tall parents. What is the name given to this mathematical phenomenon and why does it happen?

2. Another instance of explaining a physical phenomenon as a purely statistical happenstance occurs in before and after testing. Those who did very well before coaching often don’t do so well on average after, while those who did poorly before coaching often do better on average after coaching. It is then wrongfully concluded that the coaching is not useful. What might be happening here?

3. Tierney sometimes interchanges the word “probability” with the word “odds.” This can be dangerous. Suppose there are two types of people. In the long run, the first group loses (as in gets a death sentence) 19 times out of 20 and the other loses 99 times out of 100. Determine the ratio of the probabilities and the ratio of the odds. If you were a lawyer for a defendant in the latter group, which ratio would you choose to emphasize prejudice against your client?

4. The mathematical underpinning of these problems is conditional probability, a concept that is made more difficult by the vagueness of the English language. How do you interpret this sentence: “Given that 10% of old men and women who were contacted said ‘yes,’ what is the probability an individual is from the east?”

5. As another illustration of the vagueness of the English language, Tierney looks at another famous problem that often engenders wrong answers: “Mr. Smith has two children, at least one of whom is a boy. What is the probability that the other is a boy” Although it sounds the same, the answer is different from, “Given that Mr. Smith has a boy, what is the probability that his next child is a girl?” Answer each of these.

6. Bayes theorem is at the heart of conditional probability. Yet, the solutions presented by Tierney avoid using Bayes theorem. Why?

## Joe DiMaggio's Streak

Journey to Baseball's Alternate Universe

Samual Arbesman and Steven Strogatz

Op-Ed, March 30, 2008In 1941 Joe DiMaggio had a 56-game hitting streak, a fete that has not come even close to being matched. In 1988 in an article for The New York Review of Books, Stephen J. Gould wrote:

My colleague Ed Purcell, Nobel laureate in physics has done a comprehensive study of all baseball streak and slump records. His firm conclusion is easily and swiftly summarized. Nothing ever happened in baseball above and beyond the frequency predicted by coin-tossing models. The longest runs of wins or losses are as long as they should be, and occur about as often as they ought to. There is one major exception, and absolutely only one—one sequence so many standard deviations above the expected distribution that it should not have occurred at all. Joe DiMaggio's fifty-six–game hitting streak in 1941.Aresman and Strogatz say:

In a fit of scientific skepticism, we decided to calculate how unlikely Joltin'Joe's achievement really was. Using a comprehensive collection of baseball statistics from 1871 to 2005, we simulated the entire history of baseball 10,000 times in a computer. In essence, we programmed the computer to construct an enormous set of parallel baseball universes, all with the same players but subject to the vagaries of chance in each one.

Here’s how it works. Think of baseball players’ performances at bat as being like coin tosses. Hitting streaks are like runs of many heads in a row. Suppose a hypothetical player named Joe Coin had a 50-50 chance of getting at least one hit per game, and suppose that he played 154 games during the 1941 season. We could learn something about Coin’s chances of having a 56-game hitting streak in 1941 by flipping a real coin 154 times, recording the series of heads and tails, and observing what his longest streak of heads happened to be.

Our simulations did something very much like this, except instead of a coin, we used random numbers generated by a computer. Also, instead of assuming that a player has a 50 percent chance of hitting successfully in each game, we used baseball statistics to calculate each player’s odds, as determined by his actual batting performance in a given year.

To carry out their coin simulations the authors need to estimate the probability that a player gets a hit when he comes to the plate. For this they use a data set for all games from 1871 to 2005. From this data they find, for each game and each player in this game, the total number of hits in the season, the number of games played, and the number of plate appearances. Using this information they estimate the probability getting a hit when the player comes to bat. For example in 1941 DiMaggio had 193 hits in 139 game and came to the plate 621 times. From this we find that he came to the plate 4.47 times a game giving him a hit in 31.1 percent of his plate appearances and no hit in 68.9 percent of his plate appearances. So the probability going hitless in a game is .689^(4.47) = .19 and the probability of getting at least one hit is .81. Using this the authors carry out their simulation 10,000 times to get an estimate for the length of the longest run.

Note that they use plate appearances rather at bat. The official definition of these is quite complicated and can be found here At bat is used to calculate a players batting average and is more restriction than plate appearance. For example a walk counts as a plate appearance but not as an at bat. If, in the DiMaggio example, you use at bats instead of plate performances you find that the estimate that he would get at least one hit in a game is .82 instead of .81.

The authors have the computer carry out the same calculation and simulation we did for DiMaggio for all players and all season between 1871 and 2005. They call this "simulating the entire history of baseball".

They illustrate the outcome with the following graphic.

http://www.dartmouth.edu/~chance/forwiki/dimaggio.jpg The authors write:

More than half the time the record for the longest hitting streak exceeded 53 games. Two-thirds of the time, the best streak was between 50 and 64 games.

In other words, streaks of 56 games or longer are not at all an unusual occurrence. Forty-two percent of the simulated baseball histories have a streak of DiMaggio’s length or longer. You shouldn’t be too surprised that someone, at some time in the history of the game, accomplished what DiMaggio did.## Additional reading

In Defense of Joe DiMaggio. The Numbers Guy.

Carl Bialik on DiMaggio streak probabilities Sabermetric Research

## Discussion

(1) Why do you think Gould was so confident that DiMaggio's record would not be broken?

(2) If you were doing this, to estimate the probability of getting a hit, would you use at bats or plate appearances? What difference would it make?

(3) What factors that are not considered here might effect a player's longest streak in a game.