Chance News 21
Quotations
I think you're begging the question, said Haydock, and I can see looming ahead one of those terrible exercises in probability where six men have white hats and six men have black hats and you have to work it out by mathematics how likely it is that the hats will get mixed up and in what proportion. If you start thinking about things like that, you would go round the bend. Let me assure you of that!
Agatha Christie
From the Probability Web Quotations
The Mirror Crack's
Forsooth
The first two Forsooths are from the October RRS NEWS.
Long-term, serious smokers have a 50% chance of dying.
Guardian Weekend
1 April 2006, p32.
The IOC Coordination Commission were told that 80 per cent of the land had already been acquired. London Mayor Ken Livingstone added that he was hoping that, by the time the public enquiry starts at the end of next month, four-fifths of the land would have been acquired.
Radio Oxford news report
20 April 2006
Estimating the diversity of dinosaurs
Proceedings of the National Academy of Sciences
Published online before print September 5, 2006
Steve C. Wang, and Peter Dodson
Fossil hunters told: Dig deeper
Philadelphia Inquirer, September 5, 2006
Tom Avril
Steve Wang is a statistician at Swarthmore College and Peter Dodson is a paleontologist at the University of Pennsylvania. Their study was widely reported in the media. You can find references to the media coverage and comments by Steve here.
In their paper the authors provided the following description of their results. Here are a few definitions that might be helpful: genera: a collective term used to incorporate like-species into one group, nonavian: not derived from birds, fossiliferous: containing a fossil, rock outcrop: the part of a rock formation that appears above the surface of the surrounding land
Despite current interest in estimating the diversity of fossil and extant groups, little effort has been devoted to estimating the diversity of dinosaurs. Here we estimate the diversity of nonavian dinosaurs at 1,850 genera, including those that remain to be discovered. With 527 genera currently described, at least 71% of dinosaur genera thus remain unknown. Although known diversity declined in the last stage of the Cretaceous, estimated diversity was steady, suggesting that dinosaurs as a whole were not in decline in the 10 million years before their ultimate extinction. We also show that known diversity is biased by the availability of .. Finally, by using a logistic model, we predict that 75% of discoverable genera will be known within 60-100 years and 90% within 100-140 years. Because of nonrandom factors affecting the process of fossil discovery (which preclude the possibility of computing realistic confidence bounds), our estimate of diversity is likely to be a lower bound.
In this problem we have a sample of dinasaurs that lived on the earth. These dinasours are classified into groups called genera. We can count the number of each generus in our sample. From this we want to estimate the total number of dinasours that have roamed the earth. Many different methods for doing this have been developed and the authors of this study use one of the newer methods. We have discussed in previent Chance News other examples of this problem and it might help to discuss these briefly.
One of the first methods was proposed by R.A. Fisher and illustrated in term of determining the number of species of Malayan butterflies. His method is described in the paper 'The Relation Between the Number of Specis and the Number of Individuls in a Random Sample of an Animal Population', R.A. Fisher; A.Steven Corbet; C.B. Williams, The Journal of Animal Ecology, Vol. 12. No. 1, pp.442-58. (Available from Jstor).
Corbet provided the following data from his sampling of the Malyan butterflies:
n |
observed |
expected number |
1 |
118 |
156.44 |
2 |
74 |
74.52 |
3 |
44 |
47.33 |
4 |
24 |
33.82 |
5 |
29 |
25.77 |
6 |
22 |
20.46 |
7 |
20 |
16.71 |
8 |
19 |
13.93 |
9 |
20 |
11.79 |
10 |
15 |
10.11 |
11 |
12 |
8.76 |
12 |
14 |
7.65 |
13 |
6 |
6.73 |
14 |
12 |
5.95 |
15 |
6 |
5.29 |
16 |
9 |
4.73 |
17 |
9 |
4.24 |
18 |
6 |
3.81 |
19 |
10 |
3.44 |
20 |
10 |
3.11 |
21 |
11 |
2.83 |
22 |
5 |
2.57 |
23 |
3 |
2.34 |
24 |
3 |
2.14 |
In this table n is the number of times a species occurs in the sample. The second column gives the number of species that occur n times in the sample. So we see that 118 species occurred once in the sample, 74 twice and 44 three times. The their column gives the expected number that occur n times suing Fisher's model which we will explain next. Thus the expected number for n = 1,2,3 are 156.44, 74.52 and 47.33.
Fisher model assumes that the number of times a species occurs in a sample has a poisson distribution:
For a given species m is the expected number of this species that will occur in a sample. Since this will be expected to varie among the species Fisher treats this as a random variable. He chooses a distribution for m that leads him to estimate the expected number of species which appear n times in a random sample is given by
Here <math>\alpha</math> and x are parameters. If S is the number of species observed and N the the sample size \alpha and x can be determined as the values that satisfied the following two equations:
From our data we find that S = 501 and N = 3306. Using these values we find that x = .95268 and <math>\alpha = 164.21.</math> These do not agree with the values obtained by the authors but we believe them to be correct.
Probability theory is not all that useful
Don’t box yourself in when making decisions, John Kay, Financial Times, 22 August 2006.
In this article, John Kay, a weekly columnist for the Financial Times, outlines a variation on the Monte Hall problem, to highlight that human minds are not well adapted to dealing with issues of probability.
Suppose there are only two boxes and one contains twice as much money as the other. When you choose one, you are shown that it contains £100. Will you stick with your original choice, or switch to the other box?
Kay shows that it is easy to apply this problem to real situations:
Anyone who has changed jobs, bought a house or planned a merger has encountered a version of the two-box game; keep what you know, or go for an uncertain alternative.
In this game, players can lose only £50 but might gain £100 and they have no way of judging whether the £50 loss is more or less likely than the £100 gain.
Decision theory predicts an expected gain of £25 from an equal chance of winning £100 or losing £50. But many people dislike the prospect of losing £50 more than they like the prospect of gaining £100. Reflecting his economic background, Kay goes on to speculate that this irrationality may explain why the equity premium in finance is so high – volatile assets need to show much higher returns to compensate for the pain of frequently seeing small losses.
Kay then outlines what he calls the 'fallacy of large numbers'
If you accepted 100 gambles like this, you are virtually certain to end up with a substantial gain. But, you may say, I am not playing this game 100 times. I am only playing it once and you cannot guarantee a gain in a single trial. That is true, but it illustrates “the fallacy of large numbers”. On the 100th trial, you are in the same position as someone who is offered the chance to do it once. So you should not do it the 100th time. But then you should not do it the 99th time, or the 98th – or the first.
He concludes that probability theory works well for a limited class of problems, but the real world is much more open-ended and there is usually fundamental uncertainty about both the nature of the outcomes and the process that gives rise to them.
Questions
- Kay claims that 'the message of both the original Monty Hall problem and of this one is that, even in very simple cases, it is impossible to be certain that a particular mathematical representation of a real problem is a correct description. ... For people in business who rely on models and for people in financial services who must choose between boxes with uncertain contents every day, that is a disturbing conclusion.' Do you agree with his views? If yes, what does it imply for the teaching of probability?
- Repeating the game many times, increases the chance of achieving positive expected payoff. What hidden assumptions are being made here? (If a few more zeros were added to the payoffs, would your attitude be different?)
- Do you agree with his 'fallacy of large numbers'? What is it about the 99th attempt that makes it different to the first attempt? (Perhaps, think about it a statistical problem instead of a pure probability issue.)
- To benefit from the irrationality of the equity premium, Kay suggests to stop looking at share prices so often, so, in the long run, you will get the benefit of the higher return without the pain of observing volatility. Do you think there is any merit in this suggestion? Does it suggest that rational investors, facing the same problem but with different time horizons, can logically reach very different conclusions?
Submitted by John Gavin.
Intuition is better than maths
The maths may be simple but intuition is more use, John Kay, Financial Times 29 August 2006.
In the two-box problem (see the 'Probability theory is not all that useful' article above), that puzzle offers you a choice of two boxes, one containing more money than the other. Once you have made a decision, you are shown what is in your preferred box. Do you stick with your original choice, or switch?
Kay claims a knowledge of probabilities seems to be a hindrance rather than a help because, with no other information, it seems no rational decision can be made. In this follow-up article, Kay advocates a strategy that seems better than always switching or always sticking, one that beats random choice even in a situation of almost total ignorance.
Before the game starts, focus on a sum of money. It does not matter what the amount is – say, £100. The 'threshold strategy' is to switch if the box you choose contains less than £100 and to stick if it contains more. The threshold strategy gives you a better-than-even chance of getting the larger sum. It does so for any value of the threshold you choose.
If both boxes have less than £100, or more than £100, then the probability that you get the larger sum from your random choice remains one-half. But if one box has less than £100 and the other has more than £100, adopting the threshold strategy makes sure you get the larger sum. Since there is at least a possibility that the amounts in the boxes lie in this range, the threshold strategy must increase your chance of winning.
Kay suggests that there is little benefit to choosing a threshold that is very high or very low, claiming that if you have some idea, however vague, about the range of possible contents, you can tweak the threshold strategy to your advantage. He also suggests choosing a threshold in the range of sums of money that would make a real difference to you. For example, if £20,000 would not transform your life but £50,000 would, then go for £50,000.
While admiting that the logic behind this solution is not straghtfoward but he claims that, intuitively, it makes sense. He claims that in real-life problems we typically adopt the principle of being realistic while looking for something that will make a difference.
In the version of the two-box problem, in the previous Chance article, where you had the additional information that one box contained twice as much money as the other, there seemed always to be an argument for switching; the potential gain is always twice the seemingly equally likely potential loss. But this conclusion is wrong. Once more, intuition runs ahead of our mathematical understanding, he concludes.
Questions
- Do you agree with the solution outlined in this article?
- Can his solution be rephrased into a Bayesian context, incorporating prior knowledge about threshold that are not too high or low?
- Do you think Kay is right to be so skeptical about probability?
Submitted by John Gavin.</math>