align=middle width=80%

 Student's version HTML Format Word Format

An Unusual Episode

Mary Richardson
Department of Statistics
Grand Valley State University
1 Campus Drive
Allendale, MI 49401-9403

Statistics Teaching and Resource Library, March 17, 2003

© 2003 by Mary Richardson, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Dawson (1995) presented a data set giving a population at risk and fatalities for an “unusual episode” (the sinking of the ocean liner Titanic) and discussed the use of the data set in a first statistics course as an elementary exercise in statistical thinking, the goal being to deduce the origin of the data. Simonoff (1997) discussed the use of this data set in a second statistics course to illustrate logistic regression. Moore (2000) used an abbreviated form of the data set in a chapter exercise on the chi-square test. This article describes an activity that illustrates contingency table (two-way table) analysis. Students use contingency tables to analyze the “unusual episode” data (from Dawson 1995) and attempt to use their analysis to deduce the origin of the data. The activity is appropriate for use in an introductory college statistics course or in a high school AP statistics course.

Key words: contingency table (two-way table), conditional distribution

## Objectives

After completing the activity, students will understand:

 How to construct and interpret a contingency table. How to construct and interpret conditional distributions. The usefulness of contingency tables.

## The estimated interactive completion time is a one-hour class period.  Activity description

Prior to completing the activity, students should be familiar with the basics of setting up contingency tables.

To begin the interactive activity, the background of the data is discussed. The sinking of the ocean liner Titanic after colliding with an iceberg on April 15th, 1912 is referred to as an “unusual episode.” The initial data tables (given in the Student’s Version of the activity) give counts for the population at risk and the deaths for the passengers on the Titanic. The 2201 people at risk are categorized by economic status (I, II, III, or Other), age (child or adult), gender (female or male), and survival status (survived or did not survive). Economic status is determined based on the class in which the passengers traveled: first-class (I), second-class (II), third-class (III), or crew member (Other). The goal of completing the activity is to determine the historical mortality episode that produced the data.

Through using two-way tables to analyze the data, students discover interesting characteristics of the data that should help them to determine the nature of the “unusual episode.” To complete the activity, students are asked to answer a series of questions based on the data. Each question is intended to highlight a different characteristic of the data.

Teacher notes

While working on the activity, students are allowed to ask the instructor questions about the origin of the data. One question that is commonly asked is “What is the Other group?” The instructor might answer this question by pointing out that there are no children in the Other group, only 3 females, and this group does not completely fit into an economic status characterization. Another question that is commonly asked is “When did this 'unusual episode' occur?” The instructor might choose to answer this question by giving the year of the sinking of the Titanic (which will more than likely give away the answer) or the instructor might simply say that the “unusual episode” is not a recent event. Another point that the instructor might want to make is that the “unusual episode” was an isolated incident and that there were only 2201 people at risk (which might eliminate erroneous guesses for which many thousands or even millions of people were at risk).

Some interesting characteristics of the data are:

1. 68% of the people at risk died
2. 92% of the people who died were male
3. The death rate was higher for the lower economic status groups (especially among females)
4. There were no children in the Other economic status group and only 3 females (out of 673)
5. The only deaths of children were in the third-class.

In a typical class with several groups, at least one of the groups will usually correctly guess the origin of the data. Here are some example group responses to the question: “What 'unusual episode' in history do you think this data set describes?”

“This is probably the death stats for the sinking of the Titanic since rich were put on the lifeboats first and women and children took precedence over men. The 'other' could/would be crew explaining why there would be no children in that category.”

“We think this unusual episode is the sinking of the Titanic. We believe this because the ship did consist of men/women and children. The reason for the women’s death being so low is due to the fact that they were the first to be shipped on the safety of the other boats. We also believe that economic status I, II, III and other is the wealth distribution throughout the boat, I consisted of the wealthy, II consisted of the middle class, III consisted of the lower deck which had a hard time escaping because they were so close to the bottom of the ship, and we believe that the other class represents the workers on the ship. (They were the closest to the bottom of the ship as well and last to get off the ship as well.) This 'unusual episode' is the sinking of the Titanic, and that is our educated guess.”

“The data set could be explaining WWI. The rich could buy their way out of the war so they wouldn’t have as many people at risk. Women would be found in hospitals and other non-battle areas so they would be less at risk. And, children would not be present for the most part of the war.”

“We think the set describes the Civil War. Our reasoning is because men fought in the wars and the Civil War is when women started to be nurses for the Army. They were exposed to the battlefield. The children that died could have been at risk due to their age. If the child was near 18 years old they would have gone to fight. If they were not 18 years old they would be considered children still.”

“This unusual episode data could be explaining heart failure. Look at the data it shows that men at a lower economic status die of it. This holds true for heart failure. More men die of heart failure than women and children. Also the lower the economic status you are the less treatment you are able to receive.”

“We initially figured this data was describing the Black Plague, which would describe the differences in deaths in the different social classes. But this wouldn’t support the differences in gender and age. Our best guess is that this data describes the Nazi persecution of the Jews in the 30’s and early 40’s. Higher educated men and women were likely considered either useful or desirable and lower income children very undesirable or useful. The gender differences are probably explained by men being subjected to more harsh conditions because of physical work ability.”

“We believe the unusual episode that is being described is the sinking of the Titanic. First of all we see that a high number of male adults perish, and a formidably smaller amount of adult women and children perished. This would support the 'women and children first' ideal of 1912. Based on economic status we can see that a larger number of high-class citizens (male and female alike) managed to survive. While the highest numerical amount of deaths occurred in the lower two classes. In fact, the only children that perished were lower class ones. We also see by sheer number, there were more men, more lower class citizens, and few children. All of these factors would have been common place in travel (due to society, immigration and other factors) during the era of the tragedy. In general the total number of occupants seems similar to those that would have been aboard, plus the high mortality rate (68%) is common knowledge of the event.”

Through completing the activity, students see an illustration of the usefulness of two-way tables for summarizing two categorical variables. In addition, constructing appropriate conditional distributions illustrates how to informally use two-way tables to determine if two categorical variables may be associated.

After completion of the activity, the instructor might have a summary discussion. One possible point for discussion is the fact that, overall, the data set is hard to interpret. There are many classifications, and counts cannot be compared due to unequal subgroup sizes. However, by breaking down the data, focusing on two-way tables, and calculating conditional percentages, more useful information can be obtained. We can see that women had a much lower likelihood of death than men, and the rich had a lower likelihood of death than the poor (especially for women). At this point, students quite often comment on the fact that the motion picture Titanic (released in the 1990’s) portrays the third-class passengers (whose cabins were in the lower level of the ship) as being prevented from moving to the top level of the ship after the collision with the iceberg (although this fact has not been confirmed historically).

A point of caution here is that the activity involves a very informal analysis. In general, collapsing an initial contingency table over variables without examining associations between all of the variables at once leaves open the possibility of Simpson’s paradox occurring. The instructor should preface completion of the activity by telling students that a less informal analysis of contingency table data can be completed with more sophisticated statistical tools.

Assessment

Students should understand how to construct and interpret a contingency table. In addition, students should understand how to construct and interpret conditional distributions.

The following test question can be used to assess student understanding.

An insurance company has examined a large number of claims resulting from low speed collisions of vehicles and has classified the claims according to type of vehicle and to whether the claim was for more than \$10,000. The data are shown below.

 Type of Vehicle Car Truck Sport utility Claim Amount >\$10,000 147 120 270 £\$10,000 470 280 330
1. The company would like to learn more about the relationship between claim amount and type of vehicle. In particular, the company would like to compare the claim amounts for each type of vehicle. What conditional distributions should the company compute?
2. Provide the conditional distributions stated in part a.
3. Do you think there is an association between the type of vehicle and the claim amount? Explain.

References

Dawson, Robert J. M. (1995).  The ‘Unusual Episode’ Data Revisited. Journal of Statistics Education [on-line] 3(3).  (http://www.amstat.org/publications/jse/v3n3/datasets.dawson.html).

Moore, David S. (2000).  The Basic Practice of Statistics, 2nd edition. New York: W. H. Freeman and Company.

Simonoff, Jeffrey S. (1997).  The ‘Unusual Episode’ and a Second Statistics Course.  Journal of Statistics Education [on-line] 5(1).  (http://www.amstat.org/publications/jse/v5n1/simonoff.html).