Mary
Richardson, Phyllis
Curtiss, and John
Gabrosek
Department of Statistics
Grand Valley State
University
1 Campus Drive
Allendale, MI 49401-9403
Statistics Teaching
and Resource Library, June 6, 2002
© 2002 by Mary Richardson, Phyllis Curtiss, and John
Gabrosek, all
rights reserved. This text may be freely shared among
individuals, but it may not be republished in any medium without
express written consent from the authors and advance notification
of the editor.
This article describes an
interactive activity illustrating general properties of hypothesis
testing and hypothesis tests for proportions. Students generate,
collect, and analyze data. Through simulation, students explore
hypothesis testing concepts. Concepts illustrated are:
interpretation of p-values, type I error rate, type II error rate,
power, and the relationship between type I and type II error rates
and power. This activity is appropriate for use in an introductory
college or high school statistics course.
Key
words: hypothesis test on a proportion, type I and II errors,
power, p-values, simulation
Objective
After completing the "What is the
Significance of a Kiss?" activity, students will
understand:
 |
How
to perform a hypothesis test on a
proportion |
 |
How
to interpret the level of significance of a hypothesis test
(type I error rate) |
 |
How
to interpret the observed level of significance of a
hypothesis test (p-value) |
 |
How
to interpret the power of a hypothesis test |
 |
How
to interpret the type II error rate of a hypothesis
test |
 |
The
relationship between type I and type II error rates and
power |
Materials and
equipment
Each student needs 10 plain
HERSHEY’S® KISSES® chocolates, a 16-ounce
plastic cup, a flat table or desktop on which to work, two sticky
notes, and a copy of the student’s version of the activity (which
includes a statistical guide containing relevant notation,
formulas, and definitions).
Time involved
The estimated completion time is
one hour.
Activity description
Students enjoy collecting and
analyzing data, especially when chocolate is involved. In this
activity, students explore the proportion of base landings for
tossed plain HERSHEY’S® KISSES® chocolates.
Prior to completing this activity, students should be familiar
with the basic mechanics of performing hypothesis tests, including
the calculation of test statistics and p-values.
To begin
the activity, each student examines a KISSES®
chocolate. (Students are told they can eat the candies later.) The
possible outcomes if a KISSES® chocolate is tossed onto
the desktop are discussed. There are two possible outcomes -
landing completely on the base or not landing completely on the
base. Each student is then asked to determine if he or she
believes that the proportion of the time that a KISSES®
chocolate will land completely on its base is less than 50%. After
students make their conjectures, they are ready to conduct the
following experiment.
Each student puts his/her ten
KISSES® chocolates into their plastic cup and spills
the candies onto the table five times, each time counting the
number of candies that land on their base. Results are recorded on
the student’s activity sheet.
We treat the 50 results for
each student as 50 independent trials. Actually, each student has
five independent trials of 10 tosses each. We make the assumption
that the 10 tosses within a trial are roughly independent to
expedite data collection.

After data collection is completed,
students are informed that in past experiments the percentage of
the time that a plain KISSES® chocolate landed
completely on its base when tossed was consistently near 35%. When
answering the questions, students are to assume that the true
proportion of base landings is p = .35.
In question 1,
students use their KISSES® data to perform two
hypothesis tests of Ho:p=.50 versus
Ha:p<.50 with different levels of significance. Each
student’s data is a different simulated sample. Under the
assumption that the true value of p = .35, the null hypothesis,
Ho:p=.50, is false. Since Ho is false,
performing these tests provides an opportunity to use simulation
to illustrate properties of p-values, type II errors, and power.
The first test of Ho:p=.50 versus
Ha:p<.50 is performed using all of the data from the
tosses (n = 50) and level of significance a = .05. The instructor
draws stems for a stem-and-leaf plot on the whiteboard (see the
student’s version of the activity). Each student writes his/her
calculated p-value on a sticky note and places it on the
stem-and-leaf plot.

Assuming a class size of 30
students, the plot will contain 30 calculated p-values. The
p-values are calculated under the assumption that
Ho:p=.50 is true (when, in fact, p=.35), so the
p-values will tend to be small. The point that small p-values
contradict Ho is discussed with students. Some students
will not obtain small p-values. On the stem-and-leaf plot, a
cut-off value is marked at a
= .05. Each p-value falling at or
below this cut-off represents a rejection of Ho (a
correct decision). Each p-value falling above this cut-off
represents a failure to reject Ho (a type II error).
Since 30 samples are taken, and 30 tests are performed, students
can see that some samples result in a correct decision and other
samples result in an incorrect decision (type II error). Students
are asked to calculate the fraction of incorrect decisions to
obtain a simulated value for b and a simulated value for
the power = 1-b. An explanation is then given of how to
interpret a type II error rate (and power) in terms of repeatedly
performing the procedure of selecting a sample, then using the
data to test a hypothesis about a population parameter when the
null hypothesis is false.
The second test is performed using
a =
.20. The p-value is the same as for the first test; however, the
type I error rate is increased to 20%. On the stem-and-leaf plot
of p-values, a new cut-off is marked at a = .20. Each p-value
falling at or below this cut-off represents a rejection of Ho
(a correct decision). Each p-value falling above this
cut-off represents a non-rejection of Ho (a type II
error). Students are asked to calculate the fraction of
non-rejections of Ho out of the 30 tests to obtain a
simulated value for b and a simulated value for the power. In
examining the class results, students will note that an increase
in the type I error rate results in a decrease in the type II
error rate and thus an increase in the simulated power.
In
question 2, students use their KISSES® data to perform
two hypothesis tests of Ho:p=.35 versus
Ha:p¹.35
with different levels of significance. Under the assumption that p
= .35, performing these tests provides an opportunity to
illustrate properties of p-values and type I error.
The
first test of Ho:p=.35 versus
Ha:p¹.35
is performed using all of the data from the tosses (n = 50) and
a =
.05. The second test of Ho:p=.35 versus
Ha:p¹.35
is performed using a = .20. As before, a stem-and-leaf plot of the
class p-values is constructed.
The p-values are calculated
under the assumption that Ho:p=.35 is true, so the
p-values will tend to be large. The point that large p-values do
not contradict Ho is discussed with students. Some
students will not obtain large p-values. On the stem-and-leaf
plot, a cut-off value is marked at a. Each p-value falling at
or below this cut-off represents a rejection of Ho (a
type I error). Each p-value falling above this cut-off represents
a failure to reject Ho (a correct decision). Since 30
samples are taken, and 30 tests are performed, students can see
that some samples result in a correct decision and other samples
result in an incorrect decision (type I error). For each of the
a values (a = .05 and a
= .20), students are asked to
calculate the fraction of rejections of Ho out of the
30 tests to obtain a simulated value for a. An explanation is then
given of how to interpret a type I error rate in terms of
repeatedly selecting a sample, then using the data to test a
hypothesis about a population parameter when the null hypothesis
is true.
Teacher
notes
In this activity, we used the same
data set to perform two different hypothesis tests at two
different levels of significance. The instructor should emphasize
that the level of significance, null hypothesis, and alternative
hypothesis should be determined prior to data collection. We use
the same data for multiple hypothesis tests to save time.
Technically, we should have collected four separate data sets, one
for each of the four tests conducted.
In addition, the
instructor should stress to students that in reality one would not
know the true value of the population parameter p. If the
parameter value were known, then there would be no point in
utilizing sample data to draw an inference about the parameter.
The instructor should stress that we assume knowledge of the
parameter in order to investigate the properties of hypothesis
testing under different situations.
Assessment
Students should be able to explain
type I error and type II error in a specific problem.
Additionally, students should be able to describe the relationship
between type I and type II error rates.
The following
questions can be used to assess student understanding or as
challenge problems for students who complete the activity
early.
1. A parachutist has made thousands
of successful jumps. His assumption is that when he pulls the rip
cord, the parachute will open.
- Describe a type I error in the
context of this problem.
- Describe a type II error in the
context of this problem.
- Which is a more serious error for
this problem?
- Most parachutes have a back-up in
case the rip cord malfunctions. Does this guard against type I
or type II errors?
- Suppose that I pull the rip cord
and it does not function. I have time to pull it again or pull
the back-up but not both. If I were concerned about a type I
error what would I do? Why? If I were concerned about a type II
error what would I do? Why? What would you do?
2. Explain the fallacy in
reasoning in each of the following statements.
- “I wanted to reduce the chance of
committing a type I error, so I increased the power of the
test.”
- “I don’t like making mistakes so
I’m going to set the type I error rate at .0001.”
3. Explain how you would use our
class data to simulate the sampling distribution of the proportion
of base landings in 50 trials.
4. In order to answer parts
(a) and (b) below, suppose that
and you
wish to test versus Ho:p=.50 versus
Ha:p<.35.
- Assume that n = 50 and perform
this hypothesis test using a 5% level of significance
(a=.05).
- Assume that n = 100 and perform
this hypothesis test using a 5% level of significance
(a=.05).
- Give an intuitive justification
for why changing the sample size may result in changing the
conclusion about a null hypothesis.
- In general, what is the
relationship between the sample size and the absolute value of
the test statistic? (Assume that the sample size is changed, but
that the value of
does not change.)
- In general, what is the
relationship between the sample size and the p-value? (Assume
that the sample size is changed, but that the value of
does not change.) To answer this question, refer to
the standard normal curve.
- What do you think is the overall
relationship between the sample size, the type II error rate,
and the power when Ho is false?
References
Aliaga, M. and Gunderson, B.
(1999). Interactive Statistics. New Jersey:
Prentice Hall.
The HERSHEY'S® and
KISSES® trademarks are used with permission of Hershey
Foods Corporation.