Chance News 86

From ChanceWiki
Jump to navigation Jump to search

Quotations

"Asymptotically we are all dead."

--paraphrase of Keynes, sometimes attributed to Melvin R. Novick

Submitted by Paul Alper


"To err is human, to forgive divine but to include errors in your design is statistical."

--Leslie Kish, in "Chance, Statistics, and Statisticians", 1977 ASA Presidential Address

Submitted by Bill Peterson

Forsooth

Gaydar

The science of ‘gaydar’
by Joshua A. Tabak and Vivian Zayas, New York Times, 3 June 2012

The definition of GAYDAR is the "Ability to sense a homosexual" according to internetslang.com.

In their NYT article, Tabak and Zayas write

Should you trust your gaydar in everyday life? Probably not. In our experiments, average gaydar judgment accuracy was only in the 60 percent range. This demonstrates gaydar ability — which is far from judgment proficiency. But is gaydar real? Absolutely.

At PLoS One is their complete research paper where subjects viewed facial photographs of homosexuals and straights; the subjects then had a short time to decide the sexual orientation. In their first experiment, the faces were shown upright only:

Twenty-four University of Washington students (19 women; age range = 18–22 years) participated in exchange for extra course credit. Data from seven additional participants were excluded from analyses due to failure to follow instructions (n = 4) or computer malfunction (n = 3).

In the second experiment, faces were shown upright and upside-down and the subjects had a short time to decide the sexual orientation:

One hundred twenty-nine University of Washington students (92 women; age range = 18–25 years) participated in exchange for extra course credit. Data from 16 additional participants were excluded from analyses due to failure to follow instructions (n = 12) or average reaction times more than 3 SD above the mean (n = 4).

According to the authors, “there are two components of “accuracy”: the hit rate which is “the proportion of gay faces correctly perceived as gay, and the false alarm rate” which is “the proportion of straight faces incorrectly perceived as gay.” The figure reproduced below (full version here) indicates that the accuracy was better when the target gender was women as opposed to men and the accuracy was better when the faces were upright as opposed to upside down. Presumably, random guessing would produce an “accuracy” of about .5.

TabakZayas fig3.png
Figure 3. Accuracy of detecting sexual orientation from upright and upside-down faces (Experiment 2).
Mean accuracy (A′) in judging sexual orientation from faces presented for 50 milliseconds as a function of the target’s gender and spatial orientation (upright or upside-down; Experiment 2). Judgments of upright faces are based on both configural and featural processing, whereas judgments of upside-down faces are based only on featural face processing. Error bars represent ±1 SEM.

Reproduced below are the detailed results for the second experiment:

TabakZayas table1.png
Table 1. Hit and False Alarm Rates for Snap Judgments of Sexual Orientation in Experiment 2.

The author’s conclude with the following statement:

The present research is the first to demonstrate (a) that configural face processing significantly contributes to perception of sexual orientation, and (b) that sexual orientation is inferred more easily from women’s vs. men’s faces. In light of these findings, it is interesting to note the popular desire to learn to read faces like books. Considering how challenging it is to read a book upside-down, it seems that we read faces better than we read books.

Discussion

1. Gaydar "accuracy" seems to be defined in the paper as hit rate / (hit rate + false alarm rate) or, to use terms common in medical tests, positive predictive value = true positive / (true positive + false positive). The paper makes no mention of negative predictive value = true negative / (true negative + false negative). As is illustrated in the Wikipedia article, legitimate medical tests will tolerate a low positive predictive value because a more expensive test exists in the rare case that the disease is actually present; negative predictive values must be high to avoid potentially deadly false optimism. The situation here is somewhat different because the subjects were exposed to an approximately equal number of gays and straights whereas in medical tests, most people in the population do not have the “disease.”

2. Perhaps the analogy with medical testing is inappropriate. That is, an error is an error and no distinction should be made between the two types of errors. Consider the above table for the case of women and upright spatial orientation. The hit rate is .36 and the false alarm rate is .22. If we assume that the 67 subjects viewed 100 gay faces and 100 straight faces, then we obtain the following table for average values:

Orientation + - Total
Gay 36 64 100
Straight 22 78 100
Total 58 142 200

This leads to Prob (success) = (36 + 78)/200 = .57; Prob (error) = (64 +22)/200 = .43 In effect, the model could be hidden tosses of a coin and the subjects, in an ESP fashion, guess heads or tails before the toss. Of course, a Bayesian would then assume a prior distribution and combine that with the results of the study to obtain a posterior probability and avoid any mention of p-value based on .5 as the null.

3. Why might the following be a useful way of assessing gaydar via a modification of the procedure used in the article? Repeat the second experiment with no gays.

4. Why might the following be a useful way of assessing gaydar via a modification of the procedure used in the article? Repeat the second experiment with no straights.

5. In case the reader feels that Discussion #3 and #4 are deceptive, see this Psychwiki which looks at the history of deception in social psychology:

deception can often be seen in the “cover story” for the study, which provides the participant with a justification for the procedures and measures used. The ultimate goal of using deception in research is to ensure that the behaviors or reactions observed in a controlled laboratory setting are as close to those behaviors and reactions that occur outside of the laboratory setting.

Submitted by Paul Alper

Coin experiments

“The Pleasures of Suffering for Science”, The Wall Street Journal, June 8, 2012

This story, about scientists experimenting on themselves, included a reference to a coin-tossing experiment:

Even mathematics offers an example of physical self-sacrifice, through repetitive stress injury. University of Georgia professor Pope R. Hill flipped a coin 100,000 times to prove that heads and tails would come up an approximately equal number of times. The experiment lasted a year. He fell sick but completed the count, though he had to enlist the aid of an assistant near the end.

A Google search for Prof. Hill turned up the following story at the “Weird Science” website:

If you repeatedly flip a coin, the law of probability states that approximately half the time you should get heads and half the time tails. But does this law hold true in practice?

Pope R. Hill, a professor at the University of Georgia during the 1930s, wanted to find out. But he thought coin-flipping was too imprecise a measurement, since any one coin might be imbalanced, causing it to favor heads or tails.
Instead, he filled a can with 200 pennies. Half were dated 1919, half dated 1920. He shook up the can, withdrew a coin, and recorded its date. Then he returned the coin to the can. He repeated this procedure 100,000 times!

Of the 100,000 draws, 50,145 came out 1920. 49,855 came out 1919. Hill concluded that the law of half and half does work out in practice.

Discussion

1. Do you think that drawing a single coin from among 1919 and 1920 coins - even in a perfectly shaken can - would solve the problem of potential imbalance between heads and tails on any single coin toss? Can you think of any possible imbalance in the former case?
2. In the second story, there is a remarkable relationship between Hill’s final counts. What questions, if any, might it raise in your mind about the experiment?
3. Which is the more accurate expectation from a coin-tossing experiment: (a) “heads and tails would come up an approximately equal number of times” (first story) or (b) “approximately half the time you should get heads and half the time tails” (second story)?

Submitted by Margaret Cibes

New presidential poll may be outlier

“Bloomberg Poll Shows Big But Questionable Obama Lead”
Huffington Post, June 20, 2012

A Bloomberg News national poll shows Obama leading his Republican challenger by a “surprisingly large margin of 53 to 40 percent,” instead of the (at most) single-digit margin shown in other recent polls.

While a Bloomberg representative expressed the same surprise as others, she stated that this result is based on a sample with the same demographics as its previous polls and on its usual methodology.

The article’s author states:

The most likely possibility is that this poll simply represents a statistical outlier. Yes, with a 3 percent margin of error, its Obama advantage of 53 to 40 percent is significantly different than the low single-digit lead suggested by the polling averages. However, that margin of error assumes a 95 percent level of confidence, which in simpler language means that one poll estimate in 20 will fall outside the margin of error by chance alone.

See Bloomberg’s report about the poll here. Submitted by Margaret Cibes