Mary
Richardson, Phyllis
Curtiss, and John
Gabrosek
Department of Statistics
Grand Valley State
University
1 Campus Drive
Allendale, MI 494019403
Statistics Teaching
and Resource Library, June 6, 2002
© 2002 by Mary Richardson, Phyllis Curtiss, and John
Gabrosek, all
rights reserved. This text may be freely shared among
individuals, but it may not be republished in any medium without
express written consent from the authors and advance notification
of the editor.
This article describes an
interactive activity illustrating general properties of hypothesis
testing and hypothesis tests for proportions. Students generate,
collect, and analyze data. Through simulation, students explore
hypothesis testing concepts. Concepts illustrated are:
interpretation of pvalues, type I error rate, type II error rate,
power, and the relationship between type I and type II error rates
and power. This activity is appropriate for use in an introductory
college or high school statistics course.
Key
words: hypothesis test on a proportion, type I and II errors,
power, pvalues, simulation
Objective
After completing the "What is the
Significance of a Kiss?" activity, students will
understand:

How
to perform a hypothesis test on a
proportion 

How
to interpret the level of significance of a hypothesis test
(type I error rate) 

How
to interpret the observed level of significance of a
hypothesis test (pvalue) 

How
to interpret the power of a hypothesis test 

How
to interpret the type II error rate of a hypothesis
test 

The
relationship between type I and type II error rates and
power 
Materials and
equipment
Each student needs 10 plain
HERSHEY’S^{®} KISSES^{®} chocolates, a 16ounce
plastic cup, a flat table or desktop on which to work, two sticky
notes, and a copy of the student’s version of the activity (which
includes a statistical guide containing relevant notation,
formulas, and definitions).
Time involved
The estimated completion time is
one hour.
Activity description
Students enjoy collecting and
analyzing data, especially when chocolate is involved. In this
activity, students explore the proportion of base landings for
tossed plain HERSHEY’S^{®} KISSES^{®} chocolates.
Prior to completing this activity, students should be familiar
with the basic mechanics of performing hypothesis tests, including
the calculation of test statistics and pvalues.
To begin
the activity, each student examines a KISSES^{®}
chocolate. (Students are told they can eat the candies later.) The
possible outcomes if a KISSES^{®} chocolate is tossed onto
the desktop are discussed. There are two possible outcomes 
landing completely on the base or not landing completely on the
base. Each student is then asked to determine if he or she
believes that the proportion of the time that a KISSES^{®}
chocolate will land completely on its base is less than 50%. After
students make their conjectures, they are ready to conduct the
following experiment.
Each student puts his/her ten
KISSES^{®} chocolates into their plastic cup and spills
the candies onto the table five times, each time counting the
number of candies that land on their base. Results are recorded on
the student’s activity sheet.
We treat the 50 results for
each student as 50 independent trials. Actually, each student has
five independent trials of 10 tosses each. We make the assumption
that the 10 tosses within a trial are roughly independent to
expedite data collection.
After data collection is completed,
students are informed that in past experiments the percentage of
the time that a plain KISSES^{®} chocolate landed
completely on its base when tossed was consistently near 35%. When
answering the questions, students are to assume that the true
proportion of base landings is p = .35.
In question 1,
students use their KISSES^{®} data to perform two
hypothesis tests of H_{o}:p=.50 versus
H_{a}:p<.50 with different levels of significance. Each
student’s data is a different simulated sample. Under the
assumption that the true value of p = .35, the null hypothesis,
H_{o}:p=.50, is false. Since H_{o} is false,
performing these tests provides an opportunity to use simulation
to illustrate properties of pvalues, type II errors, and power.
The first test of H_{o}:p=.50 versus
H_{a}:p<.50 is performed using all of the data from the
tosses (n = 50) and level of significance a = .05. The instructor
draws stems for a stemandleaf plot on the whiteboard (see the
student’s version of the activity). Each student writes his/her
calculated pvalue on a sticky note and places it on the
stemandleaf plot.
Assuming a class size of 30
students, the plot will contain 30 calculated pvalues. The
pvalues are calculated under the assumption that
H_{o}:p=.50 is true (when, in fact, p=.35), so the
pvalues will tend to be small. The point that small pvalues
contradict H_{o} is discussed with students. Some students
will not obtain small pvalues. On the stemandleaf plot, a
cutoff value is marked at a
= .05. Each pvalue falling at or
below this cutoff represents a rejection of H_{o} (a
correct decision). Each pvalue falling above this cutoff
represents a failure to reject H_{o} (a type II error).
Since 30 samples are taken, and 30 tests are performed, students
can see that some samples result in a correct decision and other
samples result in an incorrect decision (type II error). Students
are asked to calculate the fraction of incorrect decisions to
obtain a simulated value for b and a simulated value for
the power = 1b. An explanation is then given of how to
interpret a type II error rate (and power) in terms of repeatedly
performing the procedure of selecting a sample, then using the
data to test a hypothesis about a population parameter when the
null hypothesis is false.
The second test is performed using
a =
.20. The pvalue is the same as for the first test; however, the
type I error rate is increased to 20%. On the stemandleaf plot
of pvalues, a new cutoff is marked at a = .20. Each pvalue
falling at or below this cutoff represents a rejection of H_{o
}(a correct decision). Each pvalue falling above this
cutoff represents a nonrejection of H_{o }(a type II
error). Students are asked to calculate the fraction of
nonrejections of H_{o }out of the 30 tests to obtain a
simulated value for b and a simulated value for the power. In
examining the class results, students will note that an increase
in the type I error rate results in a decrease in the type II
error rate and thus an increase in the simulated power.
In
question 2, students use their KISSES^{®} data to perform
two hypothesis tests of H_{o}:p=.35 versus
H_{a}:p¹.35
with different levels of significance. Under the assumption that p
= .35, performing these tests provides an opportunity to
illustrate properties of pvalues and type I error.
The
first test of H_{o}:p=.35 versus
H_{a}:p¹.35
is performed using all of the data from the tosses (n = 50) and
a =
.05. The second test of H_{o}:p=.35 versus
H_{a}:p¹.35
is performed using a = .20. As before, a stemandleaf plot of the
class pvalues is constructed.
The pvalues are calculated
under the assumption that H_{o}:p=.35 is true, so the
pvalues will tend to be large. The point that large pvalues do
not contradict H_{o} is discussed with students. Some
students will not obtain large pvalues. On the stemandleaf
plot, a cutoff value is marked at a. Each pvalue falling at
or below this cutoff represents a rejection of H_{o }(a
type I error). Each pvalue falling above this cutoff represents
a failure to reject H_{o }(a correct decision). Since 30
samples are taken, and 30 tests are performed, students can see
that some samples result in a correct decision and other samples
result in an incorrect decision (type I error). For each of the
a values (a = .05 and a
= .20), students are asked to
calculate the fraction of rejections of H_{o }out of the
30 tests to obtain a simulated value for a. An explanation is then
given of how to interpret a type I error rate in terms of
repeatedly selecting a sample, then using the data to test a
hypothesis about a population parameter when the null hypothesis
is true.
Teacher
notes
In this activity, we used the same
data set to perform two different hypothesis tests at two
different levels of significance. The instructor should emphasize
that the level of significance, null hypothesis, and alternative
hypothesis should be determined prior to data collection. We use
the same data for multiple hypothesis tests to save time.
Technically, we should have collected four separate data sets, one
for each of the four tests conducted.
In addition, the
instructor should stress to students that in reality one would not
know the true value of the population parameter p. If the
parameter value were known, then there would be no point in
utilizing sample data to draw an inference about the parameter.
The instructor should stress that we assume knowledge of the
parameter in order to investigate the properties of hypothesis
testing under different situations.
Assessment
Students should be able to explain
type I error and type II error in a specific problem.
Additionally, students should be able to describe the relationship
between type I and type II error rates.
The following
questions can be used to assess student understanding or as
challenge problems for students who complete the activity
early.
1. A parachutist has made thousands
of successful jumps. His assumption is that when he pulls the rip
cord, the parachute will open.
 Describe a type I error in the
context of this problem.
 Describe a type II error in the
context of this problem.
 Which is a more serious error for
this problem?
 Most parachutes have a backup in
case the rip cord malfunctions. Does this guard against type I
or type II errors?
 Suppose that I pull the rip cord
and it does not function. I have time to pull it again or pull
the backup but not both. If I were concerned about a type I
error what would I do? Why? If I were concerned about a type II
error what would I do? Why? What would you do?
2. Explain the fallacy in
reasoning in each of the following statements.
 “I wanted to reduce the chance of
committing a type I error, so I increased the power of the
test.”
 “I don’t like making mistakes so
I’m going to set the type I error rate at .0001.”
3. Explain how you would use our
class data to simulate the sampling distribution of the proportion
of base landings in 50 trials.
4. In order to answer parts
(a) and (b) below, suppose that and you
wish to test versus H_{o}:p=.50 versus
H_{a}:p<.35.
 Assume that n = 50 and perform
this hypothesis test using a 5% level of significance
(a=.05).
 Assume that n = 100 and perform
this hypothesis test using a 5% level of significance
(a=.05).
 Give an intuitive justification
for why changing the sample size may result in changing the
conclusion about a null hypothesis.
 In general, what is the
relationship between the sample size and the absolute value of
the test statistic? (Assume that the sample size is changed, but
that the value of does not change.)
 In general, what is the
relationship between the sample size and the pvalue? (Assume
that the sample size is changed, but that the value of
does not change.) To answer this question, refer to
the standard normal curve.
 What do you think is the overall
relationship between the sample size, the type II error rate,
and the power when H_{o }is false?
References
Aliaga, M. and Gunderson, B.
(1999). Interactive Statistics. New Jersey:
Prentice Hall.
The HERSHEY'S^{®} and
KISSES^{®} trademarks are used with permission of Hershey
Foods Corporation.