# Chance News 36

## Contents

## Quotation

Come, said Slartibartfast, you are to meet the mice. Your arrival on

the planet has caused considerable excitement. It has already been hailed, so I gather, as the third most improbable event in the history of the Universe. What were the first two? Oh, probably just coincidences.

The Hitchhiker's Guide to the Galaxy

Douglas Adams

## Forsooth

## Sex and Cereal

A sure way to obtain column inches in the media for biological/medical research is to study sex selection and relate it to something unexpected. In this instance, 740 pregnant women were asked to fill out a survey which dealt with their physical characteristics, habits and most of all, daily dietary intakes.

The Independent: "Big breakfast is most important meal -- if you want a baby boy."

Reuters: "Skipping breakfast may mean your baby is a girl."

New Scientist: "Breakfast cereals boost chances of conceiving boys."

CNN.com: "Study shows bananas make baby boys."

New York Times: "Boy or Girl? The Answer May Depend on Mom's Eating Habits."

Choosing a provocative title doesn't hurt "You are what your mother eats: evidence for maternal preconception diet influencing foetal sex in humans." According to the lead author, "If you want a boy, eat a healthy diet with a high calorie intake, including breakfast." From the New Scientist, "When the researchers divided the women into groups with high, medium and low intake of energy, they found that 56% of women in the high-energy group had boys, compared with 45% in [the] lowest group." Further, "Cereal intake had a bigger effect," producing 59% boys when eating one or more bowlfuls per day, "compared with only 43% who bore boys in the group eating less than a bowlful per week." The researcher tested many foods and found only cereal "significantly associated with infant sex."

### Discussion

1. Here is a wiki which looks at a different study-on mice-which also claims that nutrition affects the percentages of males and females. Which of the two is an experimental study and which is an observational study?

2. The current study was done in England and of the 740 mothers-to-be, 301 (approximately 40%) said they currently were smokers. Why would this fact cast doubt on the conclusions being applied to the United States?

3. Eating cereal for breakfast is a very American habit, duplicated in few countries; even those other countries, such as England where cereal is eaten for breakfast, have nowhere near the selection possibilities obtainable in the United States. Many industrialized countries eat little or no breakfast at all. What then should the male/female ratio be for these countries?

4. It is often said that many cereals are really candies in disguise. If so, should the mother-to-be "cut to the chase" and just have a candy bar for breakfast? If not, why not?

5. Instead of the customary .05 level, the researchers chose a p-value < 01 for determining statistical significance. Why did they lower the p-value?

6. The researchers keep referring to a "bowl of cereal." Why is this an exceedingly inexact measure?

Submitted by Paul Alper

## Monty Hall Psychology

Professors Craig R. Fox and Jonathan Levav have written an interesting paper Partition-Edit-Count: Naïve Extensional Reasoning in Judgment of Conditional Probability, They conducted several experiments to see how Duke University students interpreted and performed on probability problems especially when given alternative phrasing. For example, (a distant version of the Monty Hall problem) they considered three pharmaceutical companies, A, B, and C. Half the students were told "the FDA will publish a report in which it will reveal which of the three drugs is most effective." The other half were told "the FDA will publish a report in which it will rank the three drugs from the most effective to least effective." All the students were then informed that an independent lab definitively found that A is more effective than C.

The first group of students was asked to find "the probability that the FDA will identify A as the most effective of the three." The second group was asked to find "the probability that the FDA's rankings will list A ahead of both B and C." The correct answer is 2/3 rather than 1/2 irrespective of the wording. In the first group, 10% of 67 obtained the correct answer; in the second group, 23% of 62 obtained the correct answer. Fox and Levav present an explanation of the thought processes at work based on partitioning, editing and counting.

### Discussion

1. Near the end of the paper the authors state: "Moreover, despite the fact that participants could have solved all three puzzles computationally by invoking Bayes theorem or the definition of conditional probability, a very small proportion of these respondents seemed to attempt a computational answer, and none of the participants who explicitly invoked a formula arrived at the correct solution." Use Bayes theorem to obtain the correct answer.

2. The allegation is that this particular problem is "a distant version of the Monty Hall problem." Show how A, B, and C relate to the goats, doors and car.

3. Fox and Levav offered prize money for participation in these problems. In particular, an MBA student was offered $20 for the above problem. For some other problems, $1 was offered to anyone in the Duke University student center. Explain the discrepancy.

4. The authors claim that it makes sense that the second wording, the one with the word "rank," would more likely lead to a correct six-fold partitioning (ordering of events such as ABC, ACB, etc.) and easy editing and counting. The first wording, emphasizes "most effective," which has a three-fold partitioning (A most effective, B most effective, C most effective). Edit and count the second wording to come up with the correct 2/3. Edit and count the first wording to come up with the wrong answer, 1/2.

5. Comparing the difference in the proportion of successes for the two different wordings, the authors claim via a chi-square test that the value of chi-square is 3.5 leading to a p-value of about .06. Perform a chi-square test to duplicate their result. Perform a difference of proportions test using Fisher's exact test and show that the p-value is closer to .093. Why is the authors' p-value result of .06 incorrect?

## Brain Exercise

A currently popular medical model is that physical exercise by the elderly may help to prevent Alzheimer's disease. From the Minneapolis Star Tribune, April 17, 2008 we learn that "Each year about 15 percent of the people with MCI [mild cognitive impairment] develop Alzheimer's, compared with 1 to 2 percent of all people age 65 and older." The article discusses a Mayo Clinic study of 868 randomly chosen "people ages 70 to 89 [who] were asked to record their exercise habits when they were between 50 and 65." There were 740 normal people, 20 % of whom "said they exercised one to two times per week" and 128 who had MCI, 13.4 % of whom said they exercised one to two times per week.

### Discussion

1. Assume that 13.4 % of 128 is 17. Do a chi-square test to obtain a p-value. Ignoring Fisher's exact test, do a test for the difference of proportions with and without assuming pooling. Compare the three p-values obtained and explain the discrepancies.

2. Fisher's exact test yields yet a different p-value. Is the customary "statistical significance" obtained? Why is this test better than the chi-square or the naïve difference of proportions with or without pooling?

3. What lurking (hidden) variables might exist in this study?

4. The results of this study were presented at a conference of the American Academy of Neurology but not in a peer reviewed journal. How does this affect your view of the worthiness of the paper? If the study were related to a commercial product promoting exercise, how would this affect your view of the worthiness of the study?

Submitted by Paul Alper

## When a lower prize was bigger than the Jackput in the UK 6/49 Lottery

A comment on the item in Chance_News_35, about when 106 people matched 5 numbers only, yet 239 matched 5 + Bonus in the Canadian 6/49 lottery, resulting in a larger prize for the lower tier winners. The winning numbers were 23, 40, 41, 42, 44, 45 with Bonus 43, so we can all see how this came about!

In the UK 6/49 Lottery, there have been two occasions when the Match5+Bonus prize has exceeded the Jackpot prize: first, on Saturday 29 June 1999, the winning numbers were 2, 17, 18, 23, 30, 40, Bonus 43; here 46 shared the jackpot, each winning £152,431, while only 13 shared the Bonus prize, each getting £165,961.

Second, on Wednesday 30 August 2006, the winning numbers were 19, 21, 22, 38, 44, 49, Bonus 45; here 4 shared the jackpot, each winning £669,219, while ONE person scooped the entire Bonus pool, winning £862,395.

I offer separate explanations for both of these unusual phenomena: for the first, a very relevant piece of extra information is that for many years, Britain's bookmakers have operated their own "lottery", again of 6/49 format, but punters are offered fixed odds for matching 1, 2, 3 etc numbers. And just three days before 29 June 1999, the six winning numbers in the bookmakers' draw were EXACTLY the same as those 29 June winning numbers. Plainly, too many UK punters thought that, by choosing that set of winning numbers from the bookmakers' lottery, they were making a suitable "random" choice for the following Lotto!

For the second, I believe this is pure random chance at work! Sales are lower on Wednesdays than on Saturdays, allowing a little more scope for such random effects. On average, 6 times as many tickets will share the Bonus prize as share the Jackpot, but given enough data, with smallish numbers (the average number sharing a Wednesday jackpot is 1-2), random chance will occasionally throw up more jackpot winners than Bonus winners - and sometimes, so many more that this "prize anomaly" may arise. If you wait long enough, even rare events are sure to happen!

(The same thing nearly happened on Wed 14 July 1999, when just two tickets shared the jackpot, but NO-ONE won the Bonus prize. The Bonus Pool of £1,327,021 got added to the original jackpot pool of £4,312,821 to give a total of £5,639,842, so both jackpot winners got £2,819,921, some £663,510 more because of this rule that rewards the already fortunate! To date, there have been more tickets sharing the Jackpot than the Bonus just 11 times, 7 of them on Wednesdays.)

Submited by John Haigh

## The Isle Royale predator prey study

The Ecological Study of Wolves on Isle Royale, now in its 50th year, is the longest running large mammal predator-prey study in the world. The researchers go to the Island every winter, when no one else is there, to estimate the number of wolves and moose currently on the Island and their findings are presented in their annual Moose-Wolf Report. Their 2007-8 Report has just been made available here. You will find their estimate of the current moose-wolf populations and graphics of the history of the wolf-moose populations. The authors also give an explanation of the statistical method used to estimate the number of moose and much more. This study provides a good example to test the Lotka-Volterra Predator-prey model. You will also find here an educational video of the history of the Isle Royale Moose-Wolf study that is fun to see and would be nice to show in a classroom.

Submitted by Laurie Snell

## Ovulation

It is hard to believe what statistical studies you will find if you read what catches the eye of journalists. The following comes from "Ovulatory cycle effects on tip earnings by lap dancers: economic evidence for human estrus?" by Miller, Tybur and Jordon, Evolution and Human Behavior, 28 (2007) 375-381. They begin with reference to the results of Haselton, M.G., et al, " Ovulatory shifts in human female ornamentation: Near ovulation, women dress to impress," Hormones and Behavior, 51, 41-45. Thirty young women wore their "self-chosen clothing once during estrus (as confirmed by hormonal assay) and once during a lower fertility (luteal) cycle phase."

"Then 42 mixed-sex raters made a forced-choice judgment ('In which photo is the person trying to look more attractive?') between the two photos of each woman (with faces obscured, leaving only body and clothing cues). They chose the woman when she was in estrus about 60% of the time-- modestly but significantly above chance. This result confirmed that both male and female observers are perceptually sensitive to women's choice of more conspicuous and fashionable clothes during estrus."

Miller's research goes further, much further in determining how estrus influences attractiveness. They examined "tip earnings of professional lap dancers working in gentlemen's clubs. Eighteen dancers recorded their menstrual periods, work shifts, and tip earnings for 60 days on a [confidential] study web site." There were "296 work shifts (representing about 5300 lap dances)." The dancers not on the pill "earned about US$335 per 5-h shift during estrus, US$260 per shift during the luteal phase, and US$185 per shift during menstruation. By contrast, participants using contraceptive pills showed no estrous earnings peak."

### Discussion

1. A theme running throughout both studies is that female humans are not that different from other mammals in that estrus (increased "female sexual receptivity, proceptivity, selectivity and attractiveness") still exists despite evolutionary movement over time. Further, males can sense the ovulatory cycle. What do biology textbooks say? What do you say?

2. Haselton's study concluded with the words "modestly but significantly above chance." Translate that into statistics type lingo.

3. The lap dancers were asked "to log in to the web site every day for 60 days. Each day, they were to report their mood, work hours, work location, and tip earnings" and "were offered a payment of US$30 upon completion of the study." How might this detract from the worthiness of the study?

4. The lap dancer study focuses on economics although the authors are psychologists. A startling statement amidst the statistics and biology is: "Indeed, it seems that the optimal strategy for obtaining tips is to focus on men who are profligate, drunk and gullible rather than those who are intelligent, handsome and discerning." How does this strategy relate to a world wider than just lap dancing?

Submitted by Paul Alper

## More on Ovulation

The previous wiki looked at how female humans may not be that different from other mammals in that estrus (increased "female sexual receptivity, proceptivity, selectivity and attractiveness") still exists despite evolutionary movement over time. The clues during ovulation supposedly were visually sensed by males. Another study by Pipitone and Gallup, "Women's voice attractiveness varies across the menstrual cycle," Evolution and Human Behavior, 2008, looks at clues sensed aurally.

Thirty-Eight "Women had their voices recorded at four different times during their menstrual cycle;" the "subject's voices were only recorded while they counted from 1 to 10." There were "17 naturally cycling females and 21 females using hormonal contraceptives." Thirty males and thirty females rated the "level of voice attractiveness on a 100-point unlabelled scale with 1 being the least attractive and 100 the most attractive.

"Results showed a significant increase in voice attractiveness ratings as the risk of conception increased across the menstrual cycle in naturally cycling women. There was no effect for women using hormonal contraceptives."

### Discussion

1. This study took place at SUNY Albany and "The study was approved by the university institutional Review Board." Why was a Review Board necessary?

2. The study reports various t values and the associated p-values, presumably based on the 100-point level of voice attractiveness. What is the fundamental flaw in that methodology?

3. Especially if you are a female, record your own voice counting 1 to 10 during the course of a month and determine if you hear any difference. Ask some friends for their opinions.

4. The previous wiki on ovulation focused on the earnings of lap dancers who presumably also converse with their customers. Speculate how much of their earning power is due to their voice as opposed to their appearance.

Submitted by Paul Alper

## Real world vs Abstract in teaching statistics

A big issue in the teaching profession in the United States is how general (abstract, generic) and how specific (concrete) the approach should be. When it comes to probability and statistics presented to those not majoring in mathematics or statistics, it is generally agreed that particular examples from the real world are preferred. The article, "The Advantage of Abstract Examples in Learning Math," by Kaminski, Sloutsky and Heckler in Science, 320, 454 (2008), begs to differ, contending that the transfer of knowledge to a similar situation is more likely to occur when the training is done abstractly.

For the so-called experiment #1, they randomly divided 80 Ohio State University students into four groups; one group received information using abstract (that is, meaningless) symbols, and each of the other three groups received information that was represented with concrete symbols (such as (A) combining liquids in a cup, (B) pizza slices on a dish, or (C) tennis balls in a container). See the figure below where the left-most column refers to the learning phase using abstract symbols and the middle column refers to the learning phase using one of the three concrete symbols, liquids in a cup:

The underlying mathematical concept was the commutative mathematical group of order three but this was not mentioned. The subjects were (pre) tested via 24 questions to make sure they understood how to handle the instantiation in which they were assigned. Then, after this learning phase came a transfer phase where each group was (post) tested via 24 questions on a different situation, one which still utilized the commutative mathematical group of order three, but disguised as a children's game, the last column:

According to the authors, "Participants in the Generic 1 condition performed markedly higher than participants in each of the three concrete conditions (F(3,68) = 11.9, P <.001)." [the "3" is because there were 4 groups and the "68" is due to 80 minus 4 minus 8 subjects who were "eliminated from the analysis for failing to learn as evidenced by learning score(s) not above chance"]

### Discussion

1. As per usual, the media got it all wrong. For example, Kenneth Chang in the New York Times [1] began his discussion of the article with the headline: "Study Suggests Math Teachers Scrap Balls and Slices," followed by a non sequitor having to do with trains leaving stations 400 miles apart and the reader is asked to determine the time the trains pass each other. Why might Chang have introduced this irrelevancy?

2. The supporting online material for this study may be found at www.sciencemag.org/cgi/content/full/320/5875/454/DC1<http://www.sciencemag.org/cgi/content/full/320/5875/454/DC1> and is well worth reading. In it, you will find details about the generic and the three concrete examples for explaining the commutative mathematical group of order three. Which of the four do you find the easiest to understand? Take the learning phase of your choice and see how well you do. Then, take the "Test of Transfer Domain" on page 26 and see how well you do on this phase.

3. If you have looked at the supporting material you will note that the questions are multiple choice, some questions of which have only two choices. Explain whether this might have any bearing on the calculations and conclusions.

4. Chang writes, "Though the experiment tested college students, the researchers suggested that their findings might also be true for math education in elementary through high school, the subject of decades of debates about the best teaching methods." Relate this comment to the introduction of the so-called "new math" of several decades ago.

5. The 80-student sample came from OSU undergraduates. What is the unspoken population to which this study is supposed to apply?

6. Eight of the 80 participants were eliminated from analysis because in the learning phase they failed "to learn as evidenced by learning score(s) not above chance." Chance, according to the authors was "9 out of 24 or 37.5%." From your answer to #2, did you score above chance?

7. Because of the many comparisons between the groups there is a multitude of t and P-values in this article but no box plots or confidence intervals. What would box plots and confidence intervals add to the discussion that t and P-values ignore?

8. The authors performed other versions of the basic experiment to see if a combination of generic and concrete might be better for transfer than generic (the winner in experiment #1) alone. So-called experiment #4 started with 40 OSU subjects but seven-"one Generic and six Concrete-then-Generic"--"were removed for failing to learn the material." In this experiment, the transfer phase for the group (of the remaining 19) Generic-only yielded a mean of 83.3 with a standard deviation of 10.6; the transfer phase for the Concrete-then-Generic (of the remaining 14) yielded a mean of 65.5 with a standard deviation of 26.2. Use a statistics package to verify the authors' results: t(31) > 2.69, p < .012. Does this show that Generic of itself is better than an augmentation of (possibly irrelevant) Concrete when it comes to transferring knowledge?

9. Compare the learning phase symbols for the generic with the symbols for the children's game. Why might this account for greater transferal from the generic?