Chance News 81: Difference between revisions

From ChanceWiki
Jump to navigation Jump to search
Line 24: Line 24:
==Forsooth==
==Forsooth==
<br>
<br>
<img src=http://www.flickr.com/photos/75633726@N04/6797010149/ “FL chart.jpeg>
<img src=http://www.flickr.com/photos/75633726@N04/6797010149/“FL chart.jpeg">
<br>
<br>



Revision as of 19:04, 31 January 2012

Quotations

Eminence based medicine—The more senior the colleague, the less importance he or she placed on the need for anything as mundane as evidence. Experience, it seems, is worth any amount of evidence. These colleagues have a touching faith in clinical experience, which has been defined as “making the same mistakes with increasing confidence over an impressive number of years.” The eminent physician's white hair and balding pate are called the “halo” effect.

from Seven alternatives to evidence based medicine, British Medical Journal, 18 December 1999

Submitted by Paul Alper


"Alternative therapists don't kill many people, but they do make a great teaching tool for the basics of evidence-based medicine, because their efforts to distort science are so extreme."

Ben Goldacre, in What eight years of writing the Bad Science column have taught me, Guardian, 4 November 2011

Submitted by Bill Peterson

(Note: With regard to last two quotations, see Andrew Gelman's recent blog post on Evidence Based Medicine, which links to slides from an overview lecture that was presented at the 2011 Joint Statistical Meetings.)

"Of course, from the quasi-experimental perspective, just as from that of physical science methodology, it is obvious that moving out into the real world increases the number of plausible rival hypotheses. Experiments move to quasi-experiemtns and on into queasy experiments, all too easily."

Donald T. Campbell, in Methodology and Epistemology for Social Science: Selected Papers, page 322.

Submitted by Steve Simon

Forsooth


<img src=http://www.flickr.com/photos/75633726@N04/6797010149/“FL chart.jpeg">

Question of significance

“Ultrasounds Detect Cancers That Mammograms Missed, Study Finds”
by William Weir, The Hartford Courant, January 13, 2012

A 2009 CT law requires that “all mammogram reports include the patients' breast density information, and that women with greater than 50 percent density be recommended for additional ultrasound testing.” CT is apparently the first state to pass such a law.

For the period October 2009 to 2010, a University of Connecticut Hospital radiologist collected data on more than 70,000 cases, of which about 8,600 involved ultrasound screenings, and she found that the screenings “detected 3.25 cancers per 1,000 women that otherwise would have been overlooked.”

"When you think about it, we find four or five per thousand breast cancers in an overall screening population. So, then you add that extra three on," she said. "I think that's not insignificant."

Note that:

[The radiologist] told state officials that more data was needed to know whether ultrasound tests actually did a better job detecting tumors in breasts with high density. Ultrasounds typically cost patients more than a mammogram (particularly if their insurance has a high deductible), require skilled technologists and take longer to perform than a mammogram. .... [S]he called [the bill] a case of "putting the cart before the horse," [but that] the law presented a "golden opportunity."

The radiologist’s study has been accepted by publication in The Breast Journal.

Discussion

1. The radiologist commented that the finding of 3 additional cases of breast cancer per 1000 through the added ultrasound procedure - beyond the 4 or 5 per 1000 found through previous mammograms - was "not insignificant." Statistically speaking, what do you think she meant by that? Do you consider the phrase "not insignificant" equivalent to the term "significant," in a statistical context?
2. Suppose that her finding was statistically significant. Do you think that it was, in a real-life sense, significant enough to justify the costs of an additional ultrasound screening, in time and/or money to a patient, to her insurer, or to a health facility?
3. Do you think that CT was "putting the cart before the horse"?

Submitted by Margaret Cibes

The problems with meta-analyses

I had written a more mathematical blog entry in May, 2009 (referenced in CN 59), denoting the logical and mathematical/statistical problems with meta-analyses, but since that time many more meta-analyses have been published, and the public has discussed these results as if they were clinical fact. It is important to understand that the results of a meta-analysis should be presented only as a hypothetical clinical result, to be tested forwards in a properly designed clinical format, and not accepted as proven fact (such as the recent suggestion that women who ingest calcium supplements increase their risk of heart disease). In brief, a meta-analysis collects several studies of the same problem, none of which reaches clinical or statistical significance, in the hopes that the sum can be greater than its parts, and that combining non-significant studies can reach a significant result!

Some readily understandable problems with meta-analyses:

  1. You are never told which studies the author rejects as not being acceptable for his/her meta-analysis, so you cannot form your own opinion as to the validity of rejecting those particular studies.
  2. The problem of the Simpson Paradox, or the Yule-Simpson Effect: sometimes all the included studies point in one direction as being clinically significant, but the meta-analysis points in exactly the opposite direction. Numerous illustrations of the paradox have been discussed over the years in Chance News; this post from 2004 demonstrated different ways of calculating Derek Jeter's batting average, with differing results, using the same data in each case.
  3. There are two different statistical models or assumptions by which the analyzer combines the effects of the individual studies: the fixed effects model and the random effects model. Each model makes different assumptions about the underlying statistical distribution of observed data, so each calculation produces different results.
  4. There are two different methods for measuring the effect of the clinical intervention: standardized mean difference or correlation. Each method produces a different end result.
  5. If we look at #3 and #4, we see immediately that there are four possible combinations of analyses, leadeing to four different conclusions for the same set of studies. No one paper shows all four combinations and all four possible results.
  6. Finally, the choice of what constitutes a "significant' effect in any of the included studies is purely arbitrary. When this question was studied by clinical psychologists, no two analytical scientists reached the same conclusions of what was significant in all the included studies.

We therefore see that the result of any meta-analysis is largely dependent on the analyzer, and the reader never has enough data to redo the analysis, so the results have to be taken on faith, which is hardly a scientific result.

"There are three kinds of lies: Lies, Damn Lies, and Statistics" --Mark Twain

Submitted by Robin Motz

The case of Tamiflu

New questions raised about Tamiflu’s effectiveness
by Andrew Pollack, Prescriptions blog, New York Times, 17 January 2012

This recent news story provides an example of the first concern raised above: the results of a meta-analysis can depend critically on which studies are included. According to claims by its manufacturer, Tamiflu both reduces complications from the flu and helps to prevents transmission. A 2003 meta-analysis of 10 clinical trials appeared to support the first claim, and health agencies have accumulated stocks of Tamiflu for use during a flu pandemic, in hopes that many hospitalizations could be avoided. However, it was later pointed out that only two of these trials had been independently published. In 2009, an analysis focusing only on the published studies did not find evidence that Tamiflu reduced complications.

Now a new study by the Cochrane Collaboration, has raised even more questions. It noted that data on 60 percent of the patients in the clinical trials of Tamiflu had never been formally published. Including the unpublished data in their analysis, the investigators concluded that Tamiflu did not reduce hospitalizations. Moreover, the unpublished trials include more reports of side effects than the published ones. See also this BMJ news release, which details some of the difficulties the Cochrane group has encountered in its efforts to obtain data from Roche, the manufacturer of Tamiflu.

Submitted by Bill Peterson

Larry Summers on statistics

What you (really) need to know
by Lawrence H. Summers, New York Times, 22 January 2012.

Nick Horton sent this reference to the Isolated Statisiticians list, along with the following excerpt (Summers' sixth point) on the value of statistics:

Courses of study will place much more emphasis on the analysis of data. Gen. George Marshall famously told a Princeton commencement audience that it was impossible to think seriously about the future of postwar Europe without giving close attention to Thucydides on the Peloponnesian War. Of course, we’ll always learn from history. But the capacity for analysis beyond simple reflection has greatly increased (consider Gen. David Petraeus’s reliance on social science in preparing the army’s counterinsurgency manual).

As the “Moneyball” story aptly displays in the world of baseball, the marshalling of data to test presumptions and locate paths to success is transforming almost every aspect of human life. It is not possible to make judgments about one’s own medical care without some understanding of probability, and certainly the financial crisis speaks to the consequences of the failure to appreciate “black swan events” and their significance. In an earlier era, when many people were involved in surveying land, it made sense to require that almost every student entering a top college know something of trigonometry. Today, a basic grounding in probability statistics and decision analysis makes far more sense.

Statistics: theory vs. practice

The 2009 edition of a very reputable introductory statistics text, otherwise full of interesting questions and explanations based on real data, contains the following excerpt:

Is the form of the scatterplot straight enough that a linear relationship makes sense? Sure, you can calculate a correlation coefficient for any pair of variables. But correlation measures the strength only of the linear association, and will be misleading if the relationship is not linear.

Discussion

1. Refer to the formula for a correlation coefficient as the sum of the products of the z-scores of corresponding pairs of values, all divided by (n-1). Theoretically/mathematically, one cannot calculate a correlation coefficient for any pair of variables. Can you think of an example of a data set with two variables for which the correlation coefficient does not exist?

2. Consider data pairs in a real-life, not a theoretical, setting. Explain why the author’s categorical statement about the existence of correlation coefficients is probably more accurate than not, at least in a real-life setting.

Submitted by Margaret Cibes

Marilyn's correction on the drug-testing problem

Ask Marilyn: Did Marilyn make a mistake on drug testing?
by Marilyn vos Savant, Parade, 22 January 2012

In a company with 400 employees, 25% are randomly selected every three months for drug testing. What is the chance that a particular employee is selected (at least once) during the year? As discussed in the previous edition of Chance News, Marilyn originally misinterpreted this question, and instead pointed out that the chance of being selected in any particular quarter remains 25% (which was already implicit in the original question). The correct answer to the question as posed, as explained by several readers, is about 68%, which can be calculated as 1 minus the probability of not being selected. In the present column Marilyn provides her version of the solution:

The reasoning works this way: Of the 400 names, 25 percent (100) are selected in the first quarter. Assume “perfect” randomization for the purpose of calculation: Of the 300 that weren’t chosen, 25 percent (75) would be selected in the second quarter. Of the 225 still-unchosen names, 25 percent (about 56) would be selected in the third. And of the 169 remaining unchosen names, 25 percent (about 42) would be selected in the fourth.

So a total of 273 different people (100 + 75 + 56 + 42 = 273) will have been selected—about 68 percent of all the employees. (Many names will have been chosen more than once, so 400 tests are still administered.)

Marilyn's approach amounts to using so-called "natural frequencies" to avoid all the fractions. In a variety of publications over the years, Gerd Gigerenzer has advocated this is a more transparent way for the lay person to comprehend probability statements. See Steven Strogatz's Chances are (from his excellent "Elements of Math" articles in the New York Times), where he reviews Gigenzer's 2002 book Calculated Risks. One of the examples discussed there uses the natural frequency approach to illuminate the false positive problem in mammography.

Discussion
Do you find that Marilyn's presentation clarifies the solution? What if you apply the natural frequency approach to the complementary event?

Submitted by Bill Peterson

Berlin Numeracy Test

“Measuring Risk Literacy: The Berlin Numeracy Test”
by Edward T. Cokely et al., Judgment and Decision Making, January 2012

This 23-page paper introduces a new instrument that purports to assess statistical literacy more quickly, yet effectively, than the most common ones in current use. The authors administered the test to folks from many countries, among diverse well educated groups, with a focus on those who are charged with extremely important decisions, such as health professionals. They drew questions from other commonly used tests and developed some of their own. The paper includes lots of statistics about the authors’ results, as well as comparisons with other traditional tests.

The Berlin Numeracy Test was found to be the strongest predictor of comprehension of everyday risks…. The Berlin Numeracy Test typically takes about three minutes to complete and is available in multiple languages and formats…. The online forum[1] also provides interactive content for public outreach and education…..

Here are some questions that are included in the paper:

1. Out of 1,000 people in a small town 500 are members of a choir. Out of these 500 members in the choir 100 are men. Out of the 500 inhabitants that are not in the choir 300 are men. What is the probability that a randomly drawn man is a member of the choir? Please indicate the probability in percent.

2a. Imagine we are throwing a five-sided die 50 times. On average, out of these 50 throws how many times would this five-sided die show an odd number (1, 3 or 5)? ______ out of 50 throws.

2b. Imagine we are throwing a loaded die (6 sides). The probability that the die shows a 6 is twice as high as the probability of each of the other numbers. On average, out of these 70 throws how many times would the die show the number 6? ________out of 70 throws.

3. In a forest 20% of mushrooms are red, 50% brown and 30% white. A red mushroom is poisonous with a probability of 20%. A mushroom that is not red is poisonous with a probability of 5%. What is the probability that a poisonous mushroom in the forest is red? ________

Judgment and Decision Making is a free online journal.

Discussion

1. The authors state, “Correct answers are as follows: 1 = 25; 2a = 30; 2b = 20; 4 = 50.” Do you agree with all of these solutions?
2. To get the authors’ answer to question 2a, we must assume that each side of the die is equally probable. However, a five-sided die is not one of the five Platonic solids[2] (tetrahedron, cube, octahedron, dodecahedron, icosahedron). What do you think would be the possibility of constructing a five-sided die with equally probable sides?

Submitted by Margaret Cibes