 align=middle width=80% Student's version HTML Format Word Format

What is the Significance of a Kiss?

Mary Richardson, Phyllis Curtiss, and John Gabrosek
Department of Statistics
Grand Valley State University
1 Campus Drive
Allendale, MI 49401-9403

Statistics Teaching and Resource Library, June 6, 2002

© 2002 by Mary Richardson, Phyllis Curtiss, and John Gabrosek, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

This article describes an interactive activity illustrating general properties of hypothesis testing and hypothesis tests for proportions. Students generate, collect, and analyze data. Through simulation, students explore hypothesis testing concepts. Concepts illustrated are: interpretation of p-values, type I error rate, type II error rate, power, and the relationship between type I and type II error rates and power. This activity is appropriate for use in an introductory college or high school statistics course.

Key words: hypothesis test on a proportion, type I and II errors, power, p-values, simulation

## Objective

After completing the "What is the Significance of a Kiss?" activity, students will understand: How to perform a hypothesis test on a proportion How to interpret the level of significance of a hypothesis test (type I error rate) How to interpret the observed level of significance of a hypothesis test (p-value) How to interpret the power of a hypothesis test How to interpret the type II error rate of a hypothesis test The relationship between type I and type II error rates and power

## The estimated completion time is one hour. Activity description

Students enjoy collecting and analyzing data, especially when chocolate is involved. In this activity, students explore the proportion of base landings for tossed plain HERSHEY’S® KISSES® chocolates. Prior to completing this activity, students should be familiar with the basic mechanics of performing hypothesis tests, including the calculation of test statistics and p-values.

To begin the activity, each student examines a KISSES® chocolate. (Students are told they can eat the candies later.) The possible outcomes if a KISSES® chocolate is tossed onto the desktop are discussed. There are two possible outcomes - landing completely on the base or not landing completely on the base. Each student is then asked to determine if he or she believes that the proportion of the time that a KISSES® chocolate will land completely on its base is less than 50%. After students make their conjectures, they are ready to conduct the following experiment.

Each student puts his/her ten KISSES® chocolates into their plastic cup and spills the candies onto the table five times, each time counting the number of candies that land on their base. Results are recorded on the student’s activity sheet.

We treat the 50 results for each student as 50 independent trials. Actually, each student has five independent trials of 10 tosses each. We make the assumption that the 10 tosses within a trial are roughly independent to expedite data collection. After data collection is completed, students are informed that in past experiments the percentage of the time that a plain KISSES® chocolate landed completely on its base when tossed was consistently near 35%. When answering the questions, students are to assume that the true proportion of base landings is p = .35.

In question 1, students use their KISSES® data to perform two hypothesis tests of Ho:p=.50 versus Ha:p<.50 with different levels of significance. Each student’s data is a different simulated sample. Under the assumption that the true value of p = .35, the null hypothesis, Ho:p=.50, is false. Since Ho is false, performing these tests provides an opportunity to use simulation to illustrate properties of p-values, type II errors, and power.

The first test of Ho:p=.50 versus Ha:p<.50 is performed using all of the data from the tosses (n = 50) and level of significance
a = .05. The instructor draws stems for a stem-and-leaf plot on the whiteboard (see the student’s version of the activity). Each student writes his/her calculated p-value on a sticky note and places it on the stem-and-leaf plot. Assuming a class size of 30 students, the plot will contain 30 calculated p-values. The p-values are calculated under the assumption that Ho:p=.50 is true (when, in fact, p=.35), so the p-values will tend to be small. The point that small p-values contradict Ho is discussed with students. Some students will not obtain small p-values. On the stem-and-leaf plot, a cut-off value is marked at a = .05. Each p-value falling at or below this cut-off represents a rejection of Ho (a correct decision). Each p-value falling above this cut-off represents a failure to reject Ho (a type II error). Since 30 samples are taken, and 30 tests are performed, students can see that some samples result in a correct decision and other samples result in an incorrect decision (type II error). Students are asked to calculate the fraction of incorrect decisions to obtain a simulated value for b and a simulated value for the power = 1-b. An explanation is then given of how to interpret a type II error rate (and power) in terms of repeatedly performing the procedure of selecting a sample, then using the data to test a hypothesis about a population parameter when the null hypothesis is false.

The second test is performed using a = .20. The p-value is the same as for the first test; however, the type I error rate is increased to 20%. On the stem-and-leaf plot of p-values, a new cut-off is marked at a = .20. Each p-value falling at or below this cut-off represents a rejection of Ho (a correct decision). Each p-value falling above this cut-off represents a non-rejection of Ho (a type II error). Students are asked to calculate the fraction of non-rejections of Ho out of the 30 tests to obtain a simulated value for b and a simulated value for the power. In examining the class results, students will note that an increase in the type I error rate results in a decrease in the type II error rate and thus an increase in the simulated power.

In question 2, students use their KISSES® data to perform two hypothesis tests of Ho:p=.35 versus Ha:p
ą.35 with different levels of significance. Under the assumption that p = .35, performing these tests provides an opportunity to illustrate properties of p-values and type I error.

The first test of Ho:p=.35 versus Ha:p
ą.35 is performed using all of the data from the tosses (n = 50) and a = .05. The second test of Ho:p=.35 versus Ha:pą.35 is performed using a = .20. As before, a stem-and-leaf plot of the class p-values is constructed.

The p-values are calculated under the assumption that Ho:p=.35 is true, so the p-values will tend to be large. The point that large p-values do not contradict Ho is discussed with students. Some students will not obtain large p-values. On the stem-and-leaf plot, a cut-off value is marked at
a. Each p-value falling at or below this cut-off represents a rejection of Ho (a type I error). Each p-value falling above this cut-off represents a failure to reject Ho (a correct decision). Since 30 samples are taken, and 30 tests are performed, students can see that some samples result in a correct decision and other samples result in an incorrect decision (type I error). For each of the a values (a = .05 and a = .20), students are asked to calculate the fraction of rejections of Ho out of the 30 tests to obtain a simulated value for a. An explanation is then given of how to interpret a type I error rate in terms of repeatedly selecting a sample, then using the data to test a hypothesis about a population parameter when the null hypothesis is true.

Teacher notes

In this activity, we used the same data set to perform two different hypothesis tests at two different levels of significance. The instructor should emphasize that the level of significance, null hypothesis, and alternative hypothesis should be determined prior to data collection. We use the same data for multiple hypothesis tests to save time. Technically, we should have collected four separate data sets, one for each of the four tests conducted.

In addition, the instructor should stress to students that in reality one would not know the true value of the population parameter p. If the parameter value were known, then there would be no point in utilizing sample data to draw an inference about the parameter. The instructor should stress that we assume knowledge of the parameter in order to investigate the properties of hypothesis testing under different situations.

Assessment

Students should be able to explain type I error and type II error in a specific problem. Additionally, students should be able to describe the relationship between type I and type II error rates.

The following questions can be used to assess student understanding or as challenge problems for students who complete the activity early.

1. A parachutist has made thousands of successful jumps. His assumption is that when he pulls the rip cord, the parachute will open.

1. Describe a type I error in the context of this problem.
2. Describe a type II error in the context of this problem.
3. Which is a more serious error for this problem?
4. Most parachutes have a back-up in case the rip cord malfunctions. Does this guard against type I or type II errors?
5. Suppose that I pull the rip cord and it does not function. I have time to pull it again or pull the back-up but not both. If I were concerned about a type I error what would I do? Why? If I were concerned about a type II error what would I do? Why? What would you do?

2. Explain the fallacy in reasoning in each of the following statements.

1. “I wanted to reduce the chance of committing a type I error, so I increased the power of the test.”
2. “I don’t like making mistakes so I’m going to set the type I error rate at .0001.”

3. Explain how you would use our class data to simulate the sampling distribution of the proportion of base landings in 50 trials.

4. In order to answer parts (a) and (b) below, suppose that and you wish to test versus Ho:p=.50 versus Ha:p<.35.

1. Assume that n = 50 and perform this hypothesis test using a 5% level of significance (a=.05).
2. Assume that n = 100 and perform this hypothesis test using a 5% level of significance (a=.05).
3. Give an intuitive justification for why changing the sample size may result in changing the conclusion about a null hypothesis.
4. In general, what is the relationship between the sample size and the absolute value of the test statistic? (Assume that the sample size is changed, but that the value of does not change.)
5. In general, what is the relationship between the sample size and the p-value? (Assume that the sample size is changed, but that the value of does not change.)  To answer this question, refer to the standard normal curve.
6. What do you think is the overall relationship between the sample size, the type II error rate, and the power when Ho is false?

References

Aliaga, M. and Gunderson, B. (1999).  Interactive Statistics.  New Jersey: Prentice Hall.

The HERSHEY'S® and KISSES® trademarks are used with permission of Hershey Foods Corporation.

 © 2000-2002 STAR Library