Chance News 4
August 1 2005 to August 31 2005
What women want
What women want
The New York Times, May 24, 2005, A 25
Are men more competitive than women?
In this OP-ED piece Tierney discusses and draws conclusions from a recent study that explores gender differences in competitive environments. Two researchers, Muriel Niederle of Stanford and Lise Vesterlund of the University of Pittsburgh, ran an experiment in which women and men had to choose to participate in either a competitive or a non-competitive task. They found that, among other things, women chose to compete less often than they should have, while conversely, men chose to compete more often than they should have. The researchers apparently anticipated this result, so in addition, Nielderle and Vesterlund designed their experiment to explore potential reasons for this difference.
Specifically, participants were paid to add up as many sets of five two-digit numbers as they could in five minutes. The experiment consisted of four tasks, each containing a potentially different scheme for how participants would be paid. After the tasks were completed, for actual payment one of the four tasks was chosen at random. Here is a brief description of the four tasks. (A more detailed description and analysis may be found at Muriel Niederle's website, in the draft article "Do Women Shy away from Competition? Do Men Compete too Much?"
1. (Piece-rate) Each participant calculates the given sums and is paid 50 cents per correct answer.
2. (Tournament) The participants compete within four-person teams consisting of two women and two men. The person who completes the most correct sums receives $2.00 per sum; the other members of the group receive nothing.
3. (Tournament Entry Choice) Each participant is given a choice of payment scheme: either by piece-rate or a tournament scheme in which the participant is paid $2.00 per correct sum if and only if she completes more correct sums than were completed by the other members of her group in task 2. (Thus it is possible for more than one member of the group to "win" the tournament.)
4. (Tournament Submission Choice) No new sums are calculated. Instead, each participant is given a choice: either receive the same piece-rate payment as was generated in task 1, or submit one's task 1 performance to a tournament in which the participant receives $2.00 per correct sum if and only if she completed more correct sums in task 1 than did the other members of her group. (Again, it is possible for more than one member of the group to "win" the tournament.)
At the end of each task, each participant is only told her own performance on the task, and thus her decision to enter a tournament (in tasks 3 and 4) is not based on relative-ranking information. Also, after the tasks were completed, each subject was asked to guess the rank of her task 1 and task 2 performances. The main goal of the study is determine if men and women of the same ability on a task choose to compete at different rates, and if so, why.
Here are some highlights of their findings:
- Women and men performed equally well on both tasks 1 and 2.
- Women and men performed significantly better on task 2 (tournament) then on task 1 (piece-rate), and the size of the increase was independent of gender.
- Of the 20 tournaments in task 2, women won 11, men 9.
- 43% of the women, versus 75% of the men, ranked themselves first in their group.
- 35% of the women chose the tournament in task 3, versus 75% of the men.
- Women's task 2 performance does not predict tournament entry in task 3, and only does so marginally for men. In fact, women in the highest performance quartile for task 2 were less likely to enter the tournament than men in the lowest quartile.
- Approximately 27% of the gender difference in task 3 tournament entry can be explained by women and men forming different beliefs about their relative ranking. The remaining difference comes from a mix of both general factors (e.g. risk aversion) and tournament-specific factors (e.g. bias in estimating future performance.)
- Approximately 70% of women whose expected gain under a tournament scheme is favorable do not choose the tournament (tasks 3 and 4,) while approximately 63% of men elect a tournament when it is unfavorable to them (task 3.)
- 25% of the women submitted their task 1 results to a tournament in task 4, versus 55% of the men. Virtually all of this difference can be explained by men's over-confidence in their relative ranking.
1. Why do you think the task 3 and task 4 tournaments were designed the way they were?
2. In discussing the above gender differences, Tierney writes, "You can argue that this difference is due to social influences, although I suspect it's largely innate, a byproduct of evolution and testosterone. Whatever the cause, it helps explain why men set up the traditional corporate ladder as one continual winner-take-all competition-- and why that structure no longer makes sense." What do you think?
3. The researchers determined the probability of winning the tournament in Task 2 by "randomly creat[ing] four-person groups from the observed performance distributions." How exactly would one do this? They also determined, for each performance level (e.g., 15 correct sums) and each gender, the probability of winning a tournament with that score. How would this be done?
4. Niederle and Vesterlund also briefly discuss the cost to women for under-entry into tournaments and the costs to men for over-entry. They write, "While the magnitude of the costs is sensitive to the precise assumptions we make, the qualitative results are the same. The total cost of under-entry is higher for women, while the total cost of over-entry is higher for men. Since over-entry occurs for participants of low performance and under-entry for those with high performance, by design the cost of under entry is higher than that of over entry." Explain and comment.
Rules of engagement - modelling conflict
The mathematics of warfare - Scientists find surprising regularities in war and terrorism
The mathematics of warfare
The Economist July 23, 2005 (Available from Lexis Nexis)
Is terrorism the next format for war?
Nature July 12, 2005
Academics Neil Johnson from the Univ. of Oxford and Michael Spagao from Royal Holloway College London are using the patterns of casualities to model the development of wars. They are attempting to monitor the casualties of the conflict in Iraq, using data from a database called IraqBodyCount.
The Nature article says:
All wars and conflicts seem to generate a common and distinctive pattern of death statistics. Fifty years ago, the British mathematician Lewis Fry Richardson found that graphs of the number of fatalities in a war plotted against the number of wars of that size follow a relationship called a power law, where all the data points fall on a straight line if plotted logarithmically. This power law encodes the way in which large battles with large numbers of deaths happen very infrequently, and smaller battles happen more often.
The Economist article also gives a nice summary of power law relationships
Power-law relationships are characterised by a number called an index.
For each tenfold increase in the death toll, the probability of such an event occurring decreases by a factor of ten raised to the power of this index,
which is how the distributions get their name.
The Johnson and Spagao paper suggests a difference between conflicts inside and outside G7-countries based on their index value.
A more worrying statistic comes from another paper on the same topic by Clauset and Maxwell at the British Institute of Physics who suggest that we can expect another attack at least as severe as September 11th within the next seven years.
How people respond to terrorist attacks
The rational response to terrorism
The Economist print edition July 21st 2005, Available from Lexis-Nexis
Nobel laureate Gary Becker from the University of Chicago and Yona Rubinstein from Tel Aviv University examine how the general public responds to the threat posed by suicide-bombers in Fear and the Response to Terrorism: An Economic Analysis.
A first analysis suggests an obvious response. The miles flown by passengers on US domestic airlines fell 30% between August and October 2001 and air travel hadn't regained its 2001 peak even two years after the attack of September 11th. According to Becker and Rubinstein, it is not the risk of physical harm that moves people; it is the emotional disquiet. People respond to fear, not risk.
They give an example of the effect of suicide-bombers on bus usage in Tel Aviv. There was one attack a month, for a year, on average, from November 2001 and bus usage fell 30%. But this average masks material differences between different types of passengers. Casual users who bought tickets on the day of travel were much more likely to stay away with usage falling 40% after each attack. But regular passangers who used weekly or monthly tickets were largely undeterred.
The authors claim that the public responds to terrorism in a similar manner to its reaction to rare but deadly diseases, such as BSE or 'mad-cow disease', by avoiding beef en masse even though the probability of infection is very small.
They explain this reaction by saying that people can overcome their fear but they will only do so if it is worth their while. And overcoming their fear is a fixed cost, not a variable one, so people do not fight their fear each time they step on a bus; this only happens on their first journey. Once a person has come to terms with terror, it makes little difference whether he gets the bus twice a day or once a day. This choice may result in slightly higher risk of actual attack but a traveller is not adding anything to his fear of such a catastrophe. And it is fear, not the risk, that influences people.
Can you get fired over the wording of a questionnaire?
Researcher to be sacked after reporting high rates of ADHD
BMJ, Mar 26 2005, 330 (7493); 691
This article is not currently available without a subscription, but will be available to the general public twelve months after the original publication date.
Dr Gretchen LeFever, a researcher who has claimed that attention deficit hyperactivity disorder in children has been overdiagnosed and overmedicated has been placed on administrative leave with the intent of terminating her employment. Her employer, the East Virginia Medical School, has accused her of scientific misconduct. In the article we read:
Her work has been controversial. She first made headline news in 1999 when she reported that 8% to 10% of elementary school pupils in southeastern Virginia were being prescribed drugs for ADHD, a percentage two to three times the estimated national average (American Journal of Public Health 1999;89:1359-64).
Criticism grew after she published the results of a 2002 study showing that the prevalence of the disorder among children in grades 2 to 5 had risen to 17% (Psychology in the Schools 2002; 39: 63-71).
One of her main critics is Jeffrey Katz, a clinical psychologist in Virginia Beach and the local coordinator of the Children and Adults with Attention-Deficit/Hyperactivity Disorder group. Dr Katz questioned her claim that the condition had been diagnosed in 17% of children in grades 2 to 5.
He said, "When somebody like Dr LeFever makes these claims that are apparently not based on good research, it minimises a very real problem. Parents won't bring their children in for evaluation, because they are afraid that medication will be automatically prescribed. They think it's a bad thing and the sole treatment. But medication can have significant benefits."
An anonymous whistleblower accused her of scientific misconduct based on her 2002 publication. The survey question asked
Does your child have attention or hyperactivity problems, known as ADD or ADHD?
but the publication reported the question as
Has your child been diagnosed with attention or hyperactivity problems known as ADD or ADHD?
She has also been accused of conducting research on children without getting parental approval. The local IRB (Institutional Review Board) had originally determined that since only parents and teachers filled out surveys about the children that the children were not research subjects. This meant that the study was exempt from the normal parental approval requirements. After the allegations of scientific misconduct were raised, the medical school sought a second opinion from the national experts at the Office of Human Research Protections. This office ruled that the children were indeed research subjects. This meant that the research was not exempt from parental approval requirements.
Allegations of misconduct often degenerate into a "he said/she said" argument and it is difficult for an outsider to objectively evaluate the evidence. A web search on the name "Gretchen LeFever " will produce a wide range of opinions about her original research and the rationale for her firing.
(1) Is there a serious difference in the reported wording of the questionnaire? Would you expect the first wording to get a higher positive response? Why?
(2) Should a researcher be held responsible for ethical violations for a study that was approved by the local IRB if that approval was later found to be in error?
(3) When parents fill out a survey about their child, are they implicitly giving permission for their child to be part of the research study? If not, what would constitute permission?
(4) Do you believe that Dr. LeFever is guilty of scientific misconduct? What would be an appropriate punishment?
Update: October 17, 2005 According to the BMJ, Dr. LeFever has now been cleared of all charges. (Lenzer 2005, BMJ 2005;331:865 (15 October), doi:10.1136/bmj.331.7521.865-a).
You can't just go on telly and make up statistics, can you?
It seems we can't buy anything unless it has the approval of boffins (US readers may want to know the definition of "boffin"; here is an instructive link). But what does any of it mean? Margaret McCartney examines the suspect science that we swallow, apply and absorb every day in an online GuardianUnlimited article.
When you read or hear something like "8 out of 10 people prefered X to Y", what are the details behind this sample survey result. The article gives Pantene Pro-V as an example. They have recently been telling us, via shiny spreads in various magazines and TV ads, that its Anti-Breakage Shampoo, will lead to "up to 95% less breakage in just 10 days". It transpires that the sample size was just 48 and the survey was not a blind one.
Further investigations reveal that 10 samples of hair were tested three times and the results were "significant". Furthermore the associated adds were vetted and approved by the Broadcast Advertising Clearing Centre.
In another example the UK's Advertising Standards Authority (ASA), with a staff of 100, looked at all the major newspapers daily, but with an estimated 30m adverts printed every year in the UK, it is impossible for them to look at them all. In a recent case, a slimming pill advertising was withdrawn after making claims that were found to be based on a study on just 44 people. The ASA decided that this was too small a study to be valid.
The ASA director of communications said "Talking generally, we may accept a small sample size as reasonable proof, but this would really depend on the statistical significance of whatever tests were done. Conditional claims lead to a host of different claims, especially when 'modal verbs' are used. We might ask them to change 'can' to 'could' if they didn't have 100% proof of the 'can'. But we would also expect them to hold proof relating to the 'could'."
The article lists other examples of dubious statistics such as "93% say their skin felt softer, and 79% say their skin was firmer with each application" (of a skin care prodct) or a more serious example about medicinal benefits such as 'the drug "effectively reduces the risk of a heart attack" by "preventing build up of harmful plaques in your coronary arteries" and "reducing your risk of coronary heart disease"'.
"The key issue is that of evidence. If you don't have evidence to justify claims of benefit, then the whole argument begins to fall apart." says Dr Ike Iheanacho, editor of the Drug and Therapeutics Bulletin, a journal published by Which?.
The article finishes with the warning that marketing and science have got together and bred a weird hybrid form of sales-experiments that have taken over our advertising culture.
The more the merrier? First born do better at school
This article highlights that younger children do less well in terms of overall educational attainment than their older brothers and sisters, regardless of family size or income. Futher, the impact of birth order was more pronounced in females in later life. This suggests that parents with limited financial resources may invest more time and money in the education of their eldest child.
The underlying data are based on the entire population of Norway aged 16-74, between the years 1986 and 2000. This unique data set collated using Norway's personal identity number system, allowed them to look across families and within families to distinguish the causal effect of family size on youngsters' education.
The authors comment that "there's a lot of psychological literature on why first-born children are most successful. The main suggestion is that the eldest child acts as a teacher for the younger children and learns how to organise information and present it to others." The research team followed the children through to adulthood and examined their earnings, full-time employment status and whether the individual had become a teenage parent. The findings are claimed to represent the first comprehensive analysis of the impact of family composition on educational achievement.
"In terms of educational attainment, if you are the fourth born instead of the first, you get almost one year less education, and that is quite a lot," Salvanes, the lead author, told Reuters. "And first-born children tend to weigh more at birth than their younger brothers and sisters, which is a good predictor for educational success. Children alone with two adults also tend to get more intellectual stimulation than children in large families who get less parental attention. First-born children seem to learn from teaching their younger siblings, contrary to the common notion that younger children benefit by learning from their elders", Salvanes said. So does that mean big sisters really are smarter? "Yes. It's hard to admit because I have older sisters," Salvanes said.
The research was carried out by Sandra Black and Paul Devereux in the Dept. of Economics at UCLA and Kjell Salvanes at the Norwegian School of Economics and Business Administration. It will be presented at the 2005 World Congress of the Econometric Society, and published in the Quarterly Journal of Economics. For now, the original paper is available on-line.
Profiling Report Leads to a Demotion
The New York Times, August 24, 2005
Lawrence Greenfield, head of the Bureau of Justice Statistics, was recently demoted after a dispute over a study of racial profiling.
The flashpoint in the tensions between Mr. Greenfeld and his political supervisors came four months ago, when statisticians at the agency were preparing to announce the results of a major study on traffic stops and racial profiling, which found disparities in how racial groups were treated once they were stopped by the police.
Political supervisors within the Office of Justice Programs ordered Mr. Greenfeld to delete certain references to the disparities from a news release that was drafted to announce the findings, according to more than a half-dozen Justice Department officials with knowledge of the situation. The officials, most of whom said they were supporters of Mr. Greenfeld, spoke on condition of anonymity because they were not authorized to discuss personnel matters.
What exactly, was in this report?
The April study by the Justice Department, based on interviews with 80,000 people in 2002, found that white, black and Hispanic drivers nationwide were stopped by the police that year at about the same rate, roughly 9 percent. But, in findings that were more detailed than past studies on the topic, the Justice Department report also found that what happened once the police made a stop differed markedly depending on race and ethnicity.
Once they were stopped, Hispanic drivers were searched or had their vehicles searched by the police 11.4 percent of the time and blacks 10.2 percent of the time, compared with 3.5 percent for white drivers. Blacks and Hispanics were also subjected to force or the threat of force more often than whites, and the police were much more likely to issue tickets to Hispanics rather than simply giving them a warning, the study found.
It's worth noting that the dispute was about the press release and not about the report itself. The full [report] is out on the web.
The statistics described in the New York Times article appear in the followinng table (table 9 in the report).
Critics of the Bush administration have accused them of burying the report, but if that was the intent, the publicity has only amplified the attention that this report has received. They also cite this report as evidence that the Bush administration punishes those who publicize bad news.
There has always been concern about the independence of statistical estimates produced by U.S. Federal Agencies. If an administration could manipulate estimates of inflation and/or unemployment, then no one would trust those figures anymore.
On the other hand, politicians have always worried about unelected career government employees who may not be responsive or may even be openly hostile towards the goals of the elected President of the United States. Mr. Greenfield seems to appreciate the two sides of this issue in some quotes from him in the New York Times article.
Mr. Greenfeld declined to discuss the handling of the traffic report or his departure from the statistics agency. But he emphasized in an interview that his agency's data had never been changed because of political pressure and added that "all our statistics are produced under the highest quality standards."
As a political appointee named to his post by Mr. Bush in 2001, "I serve at the pleasure of the president and can be replaced at any time," Mr. Greenfeld said. "There's always a natural and healthy tension between the people who make the policy and the people who do the statistics. That's there every day of the week, because some days you're going to have good news, and some days you're going to have bad news."
This article has received a lot of coverage in the more liberal blogs. Run a web search on "blog Lawrence Greenfield racial profiling" to see some examples.
Bob Herbert, a writer on the editorial pages of the New York Times also commented on the racial profiling article on August 25. He offered his opinions, and then shared the following two anecdotes.
Rachel Ellen Ondersma was a 17-year-old high school senior when she was stopped by the police in Grand Rapids, Mich., on Nov. 14, 1998. She had been driving erratically, the police said, and when she failed a Breathalyzer test, she was placed under arrest.
An officer cuffed Ms. Ondersma's hands behind her and left her alone in the back seat of a police cruiser. What happened after that was captured on a video camera mounted inside the vehicle. And while it would eventually be shown on the Fox television program "World's Wackiest Police Videos," it was not funny.
The camera offered a clear view through the cruiser's windshield. The microphone picked up the sound of Ms. Ondersma sobbing, then the clink of the handcuffs as she began maneuvering to free herself. She apparently stepped through her arms so her hands, still cuffed, were in front of her. Then she climbed into the front seat, started the engine and roared off. With the car hurtling along, tires squealing, Ms. Ondersma could be heard moaning, "What am I doing?" and, "They are going to have to kill me."
She roared onto a freeway, where she was clocked by pursuing officers at speeds up to 80 miles per hour. She crashed into a concrete barrier, and officers, thinking they had her boxed in, jumped out of their vehicles. But Ms. Ondersma backed up, then lurched forward and plowed into one of the police cars.
Gunfire could be heard as the police began shooting out her tires. The teenager backed up, lurched forward and crashed into the cop car again. An officer had to leap out of the way to keep from being struck.
Ms. Ondersma tried to speed away once more, but by then at least two of her tires were flat and she could no longer control the vehicle. She crashed into another concrete divider and was finally surrounded.
As I watched the videotape, I was amazed at the way she was treated when she was pulled from the cruiser. The police did not seem particularly upset. They were not rough with her, and no one could be heard cursing. One officer said: "Calm down, all right? I think you've caused enough trouble for one day."
This is in contrast to a second incident in April 1998 where four young men in a van were pulled over.
They were neither drunk nor abusive. But their van did roll slowly backward, accidentally bumping the leg of one of the troopers and striking the police vehicle.
The troopers drew their weapons and opened fire. When the shooting stopped, three of the four young men had been shot and seriously wounded.
The woman in the first incident was white. In the second incident, three of the men were black and one was Hispanic.
(1) Do statisticians in the various U.S. Government Agencies need greater independence to protect them from political influences? If so, how could this be best achieved?
(2) Read the full report on racial profiling. What are the limitations to this study? How serious are the limitations?
(3) Do you find Mr. Herbert's two anecdotes to be more persuasive than the statistics on racial profiling?