Chance News May-June 2005

From ChanceWiki
Revision as of 18:17, 1 September 2005 by Jls (talk | contribs)
Jump to navigation Jump to search

Forsooth


As the stakes increase, Prime-Number theory Moves Closer to Proof
Wall Street Journal, Science Journal, April 8. 2005, B1
Sharon Begley

http://www.dartmouth.edu/~chance/wikividios/primeWSJ.jpg

Follow the points to find a Super Bowl champ
New York Times, 223 January, 2005, p 11
Aaron Schatz

The explanation rests in a mathematical formula created by the baseball analyst Bill James and introduced in the 1980 Baseball Abstract. James determined that the record of a baseball team could be approximated by taking the square of team runs scored and dividing it by the square of team runs scored plus the square of team runs allowed. Because of its similarity to the geometric method for determining the sum of the angles in a right triangle, he called it the Pythagorean theorem.

DISCUSSION QUESTION:

How close is the Pythagorean theorem to the theorem that the sum of the angles in a triangle is 180 degrees?

P.S. Norton Star provided this picture observed by a student Tosin while walking in New York. Evidently New Yorkers are determined to not forget the quadradic formula!

http://www.dartmouth.edu/~chance/wikivideos/quadformula.jpg

Vermont pays heavy war burden

The price they paid: By several measures, Vermont bears heavy war burden.
Valley News, January 30, 2005
Jodie Tillman.

The Valley News is the local paper covering a region in New Hampshire and Vermont that includes Dartmouth College. Their writers often consult Dartmouth faculty. For this article, the writer Jodie Tillman consulted Greg Leibon from the Dartmouth Mathematics Department.

Tillman obtained data to see if Vermont soldiers and Marines deployed to Afghanistan and Iraq are subject to greater risk than from those from other states. She asked Greg to help her analyze the data. The article is available here. Links to her data are at the end of the article. Her data included both deaths per capita and deaths per deployment. We will use only the data related to deaths per deployment. This data set gives, for each state, the number of soldiers and Marines deployed to Afghanistan or Iraq from the beginning of the Iraq war on March 2003 to Oct. 31, 2004. This data, with the computations we use, is available here. We will discuss some of Greg's analysis but we encourage readers to also read his more complete analysis. His analysis can be found here.

From the data, we see that Vermont had 1,613 soldiers and Marines deployed in the period under consideration and had 9 casualties during this period, giving it the highest death rate for all the states. Let's consider how we might design a test to determine if the high death rate in Vermont is just bad luck. For this test the null hypothesis is that the causalities are independent and the probability that a particular soldier or a Marine is killed is the same for all those deployed. With this null hypotheses, the number of casualties in a particular state has a binomial distribution B(n,p) with n the number deployed from the state and p the proportion of casualties among those deployed in all the states.

Greg calls this the naive test. This is because it would be equally newsworthy if any other state had an apparent unusually high death rate. So we now consider a test to see if at least one of the 50 states has more casualties than could be explained by chance. For our first attempt we use the same null hypothesis and do a test for each state just the way we did for Vermont. Then we reject the null hypothesis if any of the individual states, tested as our previous test for Vermont, would reject the null hypothesis.

But if we do that, and the null hypothesis is true, the probability that we reject the null hypothesis is <math>(1-(1-.05))^{50} = .92</math> which makes this a ridiculous test. A more reasonable procedure is to choose a lower confidence level for each state and choose this so that the confidence level for the overall test is .05. For this we need to choose the confidence level <math> \alpha</math> for the individual states to satisfy the equation <math>(1 - (1 - \alpha)^{50}) = .05.

</math> Asking Mathematica to solve this we obtain <math> \alpha</math> = .00102534. Thus we will choose the confidence level for each state to be .001.

We have seen that, under the null hypothesis, the probability that Vermont has 9 or more casualties is .0033, so this test does not lead to rejecting the null hypotheses. Consider now Massachusetts, the state with the second highest death rate. Massachusetts had 7146 deployed and 28 casualties. Making the same kind of computation we did for Vermont, we find that, under the null hypotheses, the probability that Massachusetts has 28 or more casualties is .0002. This is less than our confidence level .001, so for this more general test we can also reject the null hypothesis.

Incidentally, one occasionally sees a medical study, for example a study to test if a new drug is more effective than placebo, that starts off with a single test and a 5% confidence level, and along the way the researchers find other tests that can be used to test the effectiveness of the drug. They then report the drug to be effective if any of the individual tests reject the null hypothesis without changing the confidence level. As we have seen, this can give them a much better chance of rejecting the null hypotheses (showing the drug is effective) when in fact this is not the case.

We might think we have shown that we cannot explain the death rates as the result of chance. But Greg also points out that the assumption in the null hypothesis that the casualties are independent is probably not a good assumption since, for example, there might be incidences where several soldiers are killed all of whom are from the same National Guard unit and hence from the same state. In his Commentary Greg discusses models that can take this into account. This is an interesting discussion and we encourage our readers to read this in his [http//www.dartmouth.edu/~chance/ForWiki/GregComentary.pdf] Commentary.

Of course the Valley News article did not include any of this technical stuff. We read:

Gregory Leibon, a visiting professor in Dartmouth College's mathematics department who reviewed the Valley News findings, said the numbers of soldiers killed or injured is too small to draw broad conclusions, including whether Vermont soldiers are more likely to die. He noted that the addition or subtraction of a few deaths or injuries could change rankings.

"On statistical grounds, you could not reject the notion that it's not just bad luck, said Leibon".

DISCUSSION QUESTIONS:

(1) What does this last line really say? How do you think readers interpreted this statement? Do you think Greg was quoted correctly?

(2) Looking at the data we see that Florida had 62572 deployed in the period considered and only 54 casualities. The expected number of casualities under the null hypothesis is 113.87. We note that 54 casualities is 5.62 standard deviations below the expected value. Mathematica tells us that, under the null hypothesis, the probabiity of 54 or fewer casualities is 2.924099646x10^(-10). What do we make of that?

(3) A study carried out by Robert Cushing and reported by Bob Bishop in the Austin American Statesman, 12 October 2003, showed that the rural populations had a higher death rate per capita than those in urban populations. How might this be explained?