# Difference between revisions of "Chance News 49"

## Forsooths

Steven J. Dubner of the New York Times writes about Bernice Geiger, a person who "never took vacations" for fear of her embezzlement being discovered by a fill-in employee; she "was arrested in 1961 for embezzling more than \$2 million over the course of many years." Eventually, "after prison Geiger went to work for a banking oversight agency to help stop embezzlement."

Geiger's "biggest contribution: looking for employees who failed to take vacation. This simple metric turned out to have strong predictive power in stopping embezzlement."

## item 1

Provides a wonderful pun regarding Benford’s Law, “Looking out for number one.” The authors write: “Go and look up some numbers. A whole variety of naturally-occurring numbers will do. Try the lengths of some of the world's rivers, or the cost of gas bills in Moldova; try the population sizes in Peruvian provinces, or even the figures in Bill Clinton's tax return. Then, when you have a sample of numbers, look at their first digits (ignoring any leading zeroes). Count how many numbers begin with 1, how many begin with 2, how many begin with 3, and so on - what do you find? You might expect that there would be roughly the same number of numbers beginning with each different digit: that the proportion of numbers beginning with any given digit would be roughly 1/9 Infuse and Kuklo II . However, in very many cases, you'd be wrong!” Instead, we get

http://www.dartmouth.edu/~chance/forwiki/LeedingDidgit.gif
Figure 1: The proportional frequency of each leading digit predicted by Benford's Law.

Should somebody try “to falsify, say, their tax return then invariably they will have to invent some data. When trying to do this, the tendency is for people to use too many numbers starting with digits in the mid range, 5,6,7 and not enough numbers starting with 1. This violation of Benford's Law sets the alarm bells ringing.”

It is a pity that unlike for accounting data, there is no forensic counterpart to Benford’s Law for determining when a journal article is entirely fraudulent. As stated in Infuse and Kuklo you won’t be able to read [on the JBJS website] the fraudulent article, “Recombinant human morphogenetic protein-2 for type grade III open segmental tibial fractures from combat injuries in Iraq” by Timothy Kuklo, et al, which appeared in the JBJS in August, 2008 because it has been retracted. However, it is available here. The immediate impression is that as far as statistics is concerned, it looks like any other article in the health field.

The important statistics appear in Tablea 1 and III

http://www.dartmouth.edu/~chance/forwiki/TablethreeKuklo.jpg

Note that there is no claim that everyone in Group 2 (the group using Infuse) did well or that everyone in Group 1 fared poorly. Further, as in legitimate studies, there is a list of patients who were not included because of an additional problem (head injury) or were lost to follow up. The data is there for reviewers and others to do the calculations which in this paper are the difference in proportions, a standard statistical technique. Small but not immodest p-values indicate that statistical significance is obtained; detailed discussion about the fractures indicates that practical significance is also realized. The bibliography has 39 entries, only one of which has Kuklo as the author; the same entry includes one of the ghost co-authors in the retracted paper. Nothing statistically or otherwise suspicious whatsoever.

Freudian psychology is currently out of favor but his notion of a death wish still seems plausible. How else to explain the pushing of the envelope past falsification of data, denial of connection to the manufacturers of Infuse, and forging of not one, not two but four ghost authors? The aptly titled 1995 book by Feinberg and Tarrant, Why Smart People Do Dumb Things, attributes such behavior to what they deem “the four pillars of stupidity”: hubris, arrogance, narcissism and unconscious need to fail. The first three are overwhelmingly obvious, but the last named cause sounds deeply Freudian.

A New York Times update appears on June 5, 2009 and shows how Kuklo forged the signatures; “He used a distinctively different handwriting style for each of them, a form he submitted to the British journal shows.”

Dr. Timothy R. Kuklo and copies of the signatures of other Army doctors on his study that authorities say he forged.

A putative co-author “suspected that Dr. Kuklo had fabricated the comparison groups, because many soldiers had received both Infuse and a bone graft — not one or the other.” This person said, “It was like he was comparing apples and oranges. But there weren’t any apples or oranges to compare.”

Returning to the statistical aspect of the paper, Table III says 19 of 67 (28%) in Group 1 were patients who had further surgery while 5 of 62 (8%) in Group 2 (Infuse group) had further surgery. Presumably, via a chi-square test, the p-value is listed as .003. Minitab produces the same numerical result of .003 via the Fisher exact test:

Sample X N Sample p 1 5 62 0.080645 2 19 67 0.283582

Difference = p (1) - p (2) Estimate for difference: -0.202937 95% CI for difference: (-0.330382, -0.0754923) Test for difference = 0 (vs not = 0): Z = -3.12 P-Value = 0.002

Fisher's exact test: P-Value = 0.003

Some numerical discrepancies arise, however, for Table I. Table I says 51 of 67 (76%) in Group 1 had a successful “union” while 57 of 62 (92%) in Group 2 (Infuse group) had a successful union. Presumably, via a chi-square test, the p-value is listed as .015. Minitab produces the following indicating that because of the small sample sizes, the Fisher exact test yields .017 instead:

Sample X N Sample p 1 57 62 0.919355 2 51 67 0.761194

Difference = p (1) - p (2) Estimate for difference: 0.158161 95% CI for difference: (0.0356210, 0.280701) Test for difference = 0 (vs not = 0): Z = 2.53 P-Value = 0.011

Fisher's exact test: P-Value = 0.017

Table I also says 10 of 67 (14%) in Group 1 had post-operative infections while 2 of 62 (3.2%) in Group 2 (Infuse group) had post-operative infections. Presumably, via a chi-square test, the p-value is listed as .001. Minitab produces the following quite different p-value of .032:

Sample X N Sample p 1 10 67 0.149254 2 2 62 0.032258

Difference = p (1) - p (2) Estimate for difference: 0.116996 95% CI for difference: (0.0210037, 0.212988) Test for difference = 0 (vs not = 0): Z = 2.39 P-Value = 0.017

Fisher's exact test: P-Value = 0.032

However, these discrepancies are hardly in the Benford class. They may merely indicate what happens when a non-statistician medical doctor acts alone.