Chance News 28

Quotation

Upon this gifted age, in its dark hour,

Rains from the sky a meteoric shower Of facts…they lie unquestioned, uncombined. Wisdom enough to leech us of our ill Is daily spun; but there exists no loom

To weave it into fabric….
Edna Saint Vincent Millay, Sonnet 137,
Huntsman, What Quarry? (New York: Harper, 1934), p. 697.

## Does social rank determine IQ?

Big Brothers Are Smarter Than Younger Ones, Kate Shellnutt, Bloomberg.
Families' Eldest Boys Do Best on Tests, Randolph E. Schmid, AP Science Writer, FoxNews, 21 June, 2007.
To the First-Born Go the Smarts, Steven Reinberg, HealthDay.
Research Finds Firstborns Gain the Higher I.Q., Bendict Carey, NY Times, 22 June 2007.

Older brothers are smarter than their younger siblings but it is predominantly their social rank in the family, rather than their birth order or genetics, that is the cause according to a recent paper in the two journals: Science and Intelligence, led by Petter Kristensen, a doctor at Norway's National Institute of Occupational Health, in Oslo.

Most large birth-order studies are across-family, comparing the IQs of all oldest siblings surveyed to those of all youngest siblings surveyed. But such results have been criticized for failing to take account of the differences in upbringing between families. This large-scale, within-family study attempts to settle more than a half-century of scientific debate about the relationship between I.Q. and birth order. (With such lofty claims being made and because we have previously discussed this topic in Chance News - The more the merrier? First born do better at school, this Chance News article is rather long.)

Researchers have long had evidence that firstborns tended to be more dutiful and cautious than their siblings, and some previous studies found significant I.Q. differences. Until now, no one had a clear explanation for this negative association between birth order and intelligence level. Among competing explanations, two have stood out:

• one emphasizes interactions within the family for intellectual stimulation of the low birth-order siblings
• another cites prenatal or gestational factors.

For example, psychologist Robert Zajonc claims that older siblings consolidate and organize their knowledge in their natural roles as tutors to their junior siblings, knowledge which influences their IQ score. These lessons, in short, benefit the teacher more than the student. Some studies claim that both the older and younger siblings tend to describe the firstborn as more disciplined, responsible, high-achieving. While other studies suggest that, to distinguish themselves, younger siblings often develop other skills, like social charm, to compensate for lower IQ. This might explain evidence that younger siblings often live more adventurous and less conventional lives than their older brother or sister. For example, they are more likely to participate in dangerous sports and are more likely to travel to exotic places.

Some quote the example of Nobel prize winners, firstborns have won more Nobel Prizes in science than younger siblings, but often by advancing current understanding, rather than overturning it, allegedly. In contrast, Charles Darwin was the fifth of six children, Nicolaus Copernicus grew up the youngest of four and René Descartes was the youngest of three.

The authors anticipated that men who had a biological rank different from their social rank, in their family, would score better than males of similar birth order who had not experienced the early loss of elder siblings, if the social interaction hypothesis was right; whereas similar scores would support the gestational hypothesis.

The authors based their findings on the IQs of Norwegian military draftees, as measured at enlistment, and then compared the numbers with those for the draftees' brothers. They had data on birth order, health status and I.Q. scores of a quarter of a million 18 and 19 year old men, in the largest study ever to compare the intellect of children within families. (This study ignores women.) The authors allowed for factors such as parents’ education level, maternal age at birth and family size. But still eldest children scored an average of 103.2, about 3 percent higher than second children (100.3) and 4 percent higher than thirdborns (99.0). The scientists then looked at I.Q. scores in 64,000 pairs of brothers, and found the same results. Differences in household environments could not explain elder siblings’ higher scores.

Three points on an I.Q. test may not sound like much but some experts claim that it could be the difference between a high B average and a low A, for example. Its cumulative effect, over time, could be the difference between admission to an elite private school and a less exclusive public one.

Remarkably, the paper also claims that when an older brother dies, the next-oldest one scores better on subsequent tests. So it is family dynamics, not biological factors like prenatal environment, that are the critical factor. Birth order made no difference after accounting for social order

Conscripts of first rank in social terms, no matter their biological rank, scored equal to firstborn men

To test whether the difference could be due to biological factors, the researchers examined the scores of young men who became the eldest in the household after an older sibling had died. Their scores came out the same, on average, as those of biological firstborns.

This is quite firm evidence that the biological explanation is not true

Dr. Kristensen claims.

Relation between birth order and IQ score. Mean IQ scores for male conscripts, first, second and third born to Norwegian mothers with single births only and first birth between 1967 and 1976, according to birth order and number of elder siblings who died in infancy (age < 1 year). Scores are adjusted for parental education level, maternal age at birth, sibling size, birth weight and year of conscription. Error bars show 95% confidence intervals. Source: Kristensen and Bjerkedal, Science, 22 June 2007.

This image shows that conscripts of first rank in social terms, no matter what their biological rank, scored the same as firstborn men. Men of birth order three who grew up as the second eldest child had IQ scores close to those of secondborns with no elder sibling loss. Adjusted effects of birth order and social order on mean IQ scores with corresponding 95% confidence limits, from multiple linear regression, showed that these associations were stronger in adjusted models.

Frank J. Sulloway at the University of California, Berkeley, who was not involved in the study but wrote an editorial accompanying it, said

There was some room for doubt about this effect before, but that room has now been eliminated.

In contrast, Joseph Lee Rodgers, a psychologist at the University of Oklahoma, was skeptical

Past research included hundreds of reported birth order effects that were not legitimate I’m not sure whether the patterns in the Science article are real or not; more description of methodology is required.

### Questions

• How accurate do you think IQ scoring is? Would you expect IQ scores to vary over a person's lifetime? Does it depend on which test(s) you take? Might results be biased by cultural issues? How might you expect the IQ figures quoted in this study to be adjusted for such sources of variation?
• Why do you think the study ignored women? Might we expect identical results for women, a priori?
• Does it matter that the data is based only on 18 and 19 year olds, excluding all other age groups? What is the justification for extrapolating to other age groups?
• The Nobel prize winners example is surely not reliable given the small sample size. Do you agree?
• Are you persuaded that three points on an I.Q. test may be material, either for the average across a large group or for an individual if cumulated over time?
• If someone's IQ score can improve following the death of an older sibling, does this invalidate the merits of IQ tests as a standardized test of intelligence?
• How do you think the authors deal with cases involving remarriage or adoption?
• What do you think Rodgers meant by more description of (the) methodology is required?

## The game of Two-up

Owen Dearricott is from Australia and when he taught the Dartmouth probability course he could not resist telling the students about Australia's famous coin tossing game Two-up.

In the game of Two-up, two coins are placed on a block of wood called a kip by the adjudicator and the player or spinner tosses or spins the coins by throwing them at least 3 feet above his head and letting them drop to the ground. The objective is for the spinner to get three double heads when he is in the ring. If in a spin he spins two heads he has headed them in which case he gets to spin again, if he spins a head and a tail he has oned them in which case it does not count as a success but he gets to spin again, if he spins two tails though he has tailed them and lost and passes the kip on to the next player. If he manages to head them three times he has won.

Owen ask his student's to find the probability of winning at Two-up and the expected number of spins in a winning game of Two-up?

Two-up was played by Australia's soldiers during World War I.

http://www.dartmouth.edu/~chance/forwiki/2p.jpg

Members of the 22nd Battalion, mostly C Company men who had just come out of the Ville-sur-Ancre attack enjoying relaxation playing Two-up, a popular gambling game amongst the Australian

After World War I Two-up could only be played legally on ANZAC Day, Australia's day to honor their solders. However, we read in Wikipedia:

As time passed, increasingly elaborate illegal "Two-up schools" grew around Australia to the consternation of authorities but with the backing of corrupt police. The legendary Thommo's Two-up School, which operated at various locations in Sydney from the early years of the 20th century until well after World War II, was one of Australia's first major illegal gambling operations.

Legal Two-up arrived with its introduction as a "table" game at the new casino in Hobart in 1973 and it is still offered in some Australian casinos. "Two-up schools" in the Outback have also been legalised. The Wizard of Odds discribes here how Two-up is played in casinos today.

### Discussion

(1) Do the student's homework.

(2) Some versions of Two up also have the spinner also losing if there are five "oned thems" in a row. How would this effect the answers for the student's homework?

## Another View of Social Rank and IQ

Richard Feynman's IQ, according to a Google search, is variously reported as 122, 123, 124 or 125; his sister's IQ is often publicized as one point higher than his. Based on no data whatever, I guess that the average IQ of contributors to Chance News (and possibly even the general readers of Chance News) is higher. With this in mind, perhaps it is time to call a moratorium on IQ reporting. Researchers, journalists and the general public never seem to get the picture right.

For example, in the New York Times of June 21, 2007, "The study found that eldest children scored about three points higher on I.Q. tests than their closest sibling. The difference was an average, meaning that it showed up in most families, but not all of them." The last sentence is a gratuitous comment to anyone who is statistically literate; the fact that the New York Times writer felt compelled to state it for his MSM audience speaks volumes about the level of statistical literacy in the United States.

Then, there is a tendency in the press to emphasize that these Norwegian results--the three point difference in favor of the first born--are not merely significant, but statistically significant, as if the adjective statistically is an intensifier, illustrating that try as we must, the distinction between statistical significance and practical significance is a losing battle with journalists.

But the major error made in the comments about the Science article and indeed, in almost every thing written about IQ, is the conflation of IQ with intelligence, as if the former is in one-to-one correspondence with the latter. The very first sentence of the Science article is "We have established a database in order to study the relations between perinatal factors and intelligence in adult age."

The readers of Chance News would do well to read Stephen Jay Gould's Mismeasure of Man for the unseemly history of IQ measurements from craniometry to the present-day tests. Along the way, females must be inferior because their heads are smaller than heads of males, Galton's use of physiological and reaction times as an indicator of intelligence and Terman's perversion of Binet's effort to identify children who needed help, and to provide that assistance. Terman introduced the notion of dividing a score by the person's age and thus, the "Q" for quotient. His followers, similar to a religious cult, truly believe that not only does IQ measure that vague concept known as intelligence, but moreover, strict rank order is in effect. An individual's weight, cholesterol, blood pressure or time to complete a marathon may vary but somehow the general public feels that one's IQ is fixed and immutable.

Notwithstanding the tenuous connection between IQ and intelligence, various explanations for the three-point difference are given both in the Science article and in reviews of the article. The best one may be found at one of Steve Sack's cartoons for the Minneapolis Star Tribune for June, 2007 (number 6) The top of the cartoon has "NEWS ITEM: SCIENTISTS PUZZLED AS TO WHY OLDER SIBLINGS SHOW HIGHER IQ SCORES... The cartoon shows an older brother choking his younger brother while banging on the head of his younger brother. In obvious pain, the younger brother gasps, "I HAVE A THEORY..."

Discussion:

1. The following is from a 1922 article by Walter Lippmann who at that time was America's dean of political commentators.

The danger of the intelligence tests is that in a wholesale system of education, the less sophisticated or the more prejudiced will stop when they have classified and forget that their duty is to educate.... Readers who have not examined the literature of mental testing may wonder why there is reason to fear such an abuse of an invention that has many practical uses. The answer, I think, is that most of the more prominent testers have committed themselves to a dogma which must lead to just such abuse. They claim not only that they are really measuring intelligence, but that intelligence is innate, hereditary and predetermined. They believe that they are measuring the capacity of a human being for all time and that this capacity is fatally fixed by the child's heredity. Intelligence testing in the hands of men who hold this dogma could not but lead to an intellectual caste system in which the task of education had given way to the doctrine of predestination...If the intelligence test really measured the unchangeable hereditary capacity of human beings as so many assert, it would inevitably evolve from an administrative convenience into a basis for hereditary caste.

In this age of continual testing, have Lippmann's fears come true? Or, are we more sophisticated in the 85 years since?

2. Returning to the Science article, it should be noted that none of the 240,000 Norwegians took an actual IQ test. The Norwegians took a different test, mean of 5, standard deviation of 2, and then a transform took place to deliver the customary mean of 100, standard deviation of 15. The authors justify this because the historical correlation between the test administered to the Norwegian conscripts and the WAIS IQ test has been .73. How good is a ".73" correlation coefficient?

3. Do a Google search to determine how many different IQ tests exist.

4. The Science article deals with Norwegian males. Comment on whether or not any inference can or should be made to males in other countries which are culturally less homogeneous and/or may have a different family structure.

5. On the basis of no data, speculate on what conclusions could be reached regarding female rank and IQ.

## Understanding suicide rates

Elusive, but not always unstoppable, The Economist, 21 June 2007.

According to The Economist suicide rates can be correlated with other social and economic indicators. Their claim is based on a report by The Organisation for Economic Co-operation and Development, a Paris-based think-tank for rich countries, which says

the same range of factors explains cross-country differences in people's declared degree of contentment with life, and suicide rates. So four-fifths of the variance in suicide rates across 50 countries can be explained by differences in the rates of divorce and unemployment, in quality of government, religious beliefs, trust in other people and membership of non-religious groups.

This may explain why so many ex-communist countries have high suicide rates (over 13 per 100,000) and so many Latin American countries have low ones (under 6.5). Some differences are nonetheless striking. Among rich countries, the high rates of Hungary, Japan, Belgium and Finland stand out, whereas most Mediterranean countries score low (below five). Ireland has a significantly higher rate than its neighbour, Britain.

The report claims that some differences can be explained by

• The easy availability of poisons. For example, China is one of the few countries in which more women kill themselves than men - among Chinese under 45, the female rate is twice the rate among males. In fact, over half the world's female suicides are Chinese. Part of the explanation clearly lies in the high rate among rural women, which in turn may be partially explained by the ready availability of poisons (weed killers and pesticides) and the absence of any effective treatment.
• Fashion. Many Sri Lankans kill themselves by eating the seeds of the yellow oleander, a common shrub. Intentional self-poisoning with these seeds was almost unheard of in Sri Lanka before 1980, but in that year two girls committed suicide by eating them. Inadvertently, they started a trend. Similar fashions often follow the suicide of a celebrity such as Michael Hutchence, an Australian pop star who apparently took his own life in 1997 or M.J. Nee, a Taiwanese actor who hanged himself in 2005.

### Questions

• Suicide rates are typically presented as the rate per 100,000 population. How might this metric be misleading for countries with small populations?
• The data typically takes two to four years to compile. Do you care what caused these delays? Is it possible that these causes could become part of the statistical analysis to explain differences between countries?
• Although the OCED rates are based on real death certificates signed by legally authorized personnel, suicide can be a taboo subject in some cultures. So should all suicide estimates be adjusted for under-reporting?
• Historically, in Ireland, the actual cause of death was reported, but the suicide aspect overlooked. This practice has changed, and Ireland has recorded a steep rise in suicides in the last decade or so. Can you think of any adjustments that might cater for this trend? How might a model adjustment for cultural changes differ to an adjustment for the introduction of easily available poisons or the suicide of someone famous?
• Assuming the purpose of collating suicide rates is suicide prevention, why isn't the method used to commit suicide a more prominent part of the analysis? Would information about unsuccessful suicide attempts be helpful when trying to predict how best to prevent suicides?
• The OECD report claims that,
in the year 2000, approximately one million people died from suicide, and 10 to 20 times more people attempted suicide worldwide. This suggests that more people are dying from suicide than in all of the several armed conflicts around the world and, in many places, about the same or more than those dying from traffic accidents.
Does such a claim seem plausible to you, a priori? Where would you look to find estimates of annual deaths from all wars? Are such comparisons valid given the uncertainty and nature of the two variables? The OECD reports also claims
In all countries, suicide is now one of the three leading causes of death among people aged 15-34 years.
Which of these two quotes are you more inclined to believe?

## Sloppy animal research

The Trouble With Animal Models. Why did human trials fail? By Andrea Gawrylewski. The Scientist, Volume 21, Issue 7, Page 44.

Extrapolating research results from animals to humans is a perilous task at best. But it appears that the process is getting worse rather than better. More and more therapies that appear to be promising in animal trials end up failing to work when they are tested on humans. It appears that sloppy research may be part of the blame.

"People don't report if studies are randomized," says Ian Roberts, professor of epidemiology at the London School of Hygiene and Tropical Medicine. How animals are selected, or whether assessments were blind, are rarely included in the methods and thus create a potential for bias. "Imagine a cage of 20 rates, and you've got a treatment for some," explains Roberts. "So you stick your hand in a cage, and pull out a rat. The rats that are the most vigorous are hardest to catch, so when you pull out 10 rats, they're the sluggish ones, the tired ones, they're not the same as the ones still in the cage, and they're the control. Immediately there's a difference between the two groups."

Animal experiments are frequently published without sufficient information to allow researchers to critically evaluate the results. This incomplete documentation appears to be more than just an oversight, and may represent a surrogate marker for the lack of overall care in the conduct of the research. A research study published in Academic Emergency Medicine showed that animal experiments with no information on randomization and blinding are five times more likely to report a positive result than research where these are clearly defined.

There are other problems. Publication bias is an issue in animal studies. The stress that rats experience living in small cages may skew the results. There is also an excessive reliance on a small number of inbred strains of mice and rats.

### Questions

1. Animal research is tightly regulated in most countries. Why has this regulation not led to better quality publications?

2. Should animal testing be scaled back? What problems might occur if testing of new drugs bypassed the animal phase?

## How satisfied are you with your life?

Where money seems to talk, The Economist, 12th Jul 2007.
Happiness and politicians don't always mix, Hamish McRae, Business, Independent Newspapers (UK).
What makes us happy, Sydney Morning Hearld, September, 2006.

Happiness is enjoying a boom, with an increasing number of academic papers and books on the subject. But, over the last fifty years, while Western societies have got richer, happiness, as measured in surveys, has not increased. For example, Americans were no more likely to describe themselves as happy in the 1990s than in the 1940s. And even the Japanese, who went from near-poverty in the 1950s to affluence in the 1980s, did not become happier.

Most governments have typically focused on expanding the economy, and creating wealth. They build the infrastructure of a 'good' society: jobs, schools, hospitals, security and rising living standards. But this perceived lack of happiness is a problem for governments - unhappy people will tend to vote them out of office. So how could governments go about remedying the situation? Should they removing obstacles to individual creativity and entrepreneurship; or is it by providing enough social and other supports so that we don't have to worry about the basics of life? In other words, is it by focusing on economic or social wellbeing? UK politician, David Cameron, has called for the focus of society to shift from GNP to well-being, from greater wealth to greater happiness.

Two recent polls argue that growth and income play a part in boosting people's satisfaction with life and their attitude to the future. One survey from Gallup even claims to be the first genuine 'world poll', covering 130 countries. Such surveys try to establish links between happiness, income and optimism, sometimes with conflicting results.

Governments increasingly sample the preferences of their citizens before making policy decisions. Bobby Duffy at Ipsos MORI, the other surveying company, claims:

Happiness is a big issue for government. People have quite clear ideas about what they want.

Interviewers asked a standard question: how satisfied are you with your life, on a scale of zero to ten? In rich countries, most people say they are happy; in poor countries, people say that they are not happy.

By looking at levels of satisfaction that countries feel right now, Princeton University's Angus Deaton claims there is good fit between Gallup's satisfaction score and national income based on purchasing-power parities:

a map of the results looks like an income plot of the world.

The report asserts that declared levels of happiness are correlated with wealth and the pattern also seems to hold true within countries, as well as between them. That is, rich Americans or Brazilians are happier than poor ones.

The survey also asked questions about confidence in the future. Regardless of countries' current income, there was a close correlation between GDP growth and optimism. For example, China, India and Russia are most optimistic; France, Germany and Italy are the least. If both polls are right, the Chinese are pretty miserable now but they expect a dramatic turn for the better.

Money can make people happy, but not as much as you think. Such surveys tend to conclude that once people climb out of poverty, the link between money and feeling good is weak. The average American, for example, is much richer than the average Icelander or Dane, but also less happy. It appears that money's influence is swamped by non-pecuniary factors. Being married, for example, is more important than being rich. And having friends is more critical than a pay rise.

Worldwide research shows that in rich societies, what really affects happiness is the quality of personal relationships. At the top is family, and friends. In particular, people compare themselves with those nearest to them, such as their colleagues at work and others in their own family. And it's a rat race. For example, in east Germany, the living standards of those in work have soared since 1990, but their level of happiness has plummeted.

The east Germans now compare themselves with west Germans rather than with other countries in the old Soviet bloc.

says author Professor Richard Layard. The unhappiest people, it is claimed, are the unemployed, the mentally ill, and to a lesser extent the separated and divorced.

The Economist says that evidence for this comes from surveys in most rich countries, such as America's general social survey. These show that happiness has been flat for decades, even though incomes have risen sharply.

The Economist goes on to suggest that contradictions between these surveys could be due to a number of issues:

• Definitional problems may provide part of the explanation. These are self-reported polls and people mean different things by “happiness”. Cultural biases are likely to be much greater when 130 states are involved. For example, some countries are intrinsically happier than others, or say they are, at least.
• Perhaps 'happiness' is really a proxy for something else, such as health. Perhaps the main point is that money mitigates poor health, so the rich are happier than the poor mainly because they feel healthier.
• Lastly, as the Ipsos Gallop poll clearly shows, happiness and optimism are not just different, they can be contradictory. The Chinese are dissatisfied but upbeat; Europeans are happy now but dread tomorrow.

### Questions

• The typical World Poll survey in a country consists of 1,000 completed questionnaires. Why are surveys often based on this seemingly magic number? In some countries, oversamples may be collected in major cities. For example, The MORI poll collected an additional 500 interviews in Moscow. Why do you think that they might have done this?
• The surveys lists possible sources of error such as: nonresponse, measurement error associated with the questionnaire (translation issues and coverage error - where a parts of the target population have a zero probability of being selected for the survey). Are such sources of error likely to be more or less material than the error from the subjective nature of measuring happiness?
• Given that the surveys says people seek happiness over wealth, should govenrments established a happiness auditor? Should the national statistics offices construct a happiness index? In the latter case, how might the necessary 'happiness data' be sourced?
• One of the reports says a survey is only a survey - and you don't want to take the detail of this one overly seriously - to what do you think the reporter is referring? Why should the results of this surveys be treated less seriously than any other survey?
• There is also evidence of a correlation between religious beliefs and happiness. What do family, work and religion have in common? How realistic is it to condition on a difficult-to-measure varialble like religion? Can you think of any other factors that might correlate with happiness and how realistic is it to incorporate them in a survey's analysis?
• If the surveys are correct - people value happiness above health - should governments' prime objective be the happiness of its people, rather than increasing wealth?
• Can you think of any public policy implications from this quest for happiness?
• How might you test the hypothesis that although people might say that they value happiness above money, they don't act accordingly?
• Is it scientifically plausible to assert that there is sufficient evidence in such surveys to say that governments should help us in our search for a meaningful life? If not, what is their primary purpose?
• The Gallup World Poll uses two primary methodological designs:
• A Random-Digit-Dial (RDD) telephone survey design is used in countries where 80% or more of the population has landline phones.
• In the developing world, an area frame design is used for face-to-face interviewing.
• Are you familiar with both methods? Why do you think different designs are needed for different countries? How might the results be adjusted to ameliorate any biases?