Mary
Richardson, Phyllis
Curtiss, John Gabrosek,
and Diann
Reischman
Department of Statistics
Grand Valley State
University
1 Campus Drive
Allendale, MI 49401-9403
Statistics Teaching
and Resource Library, September 1, 2002
© 2002 by Mary Richardson, Phyllis Curtiss, John
Gabrosek, and
Diann Reischman, all rights reserved. This text may be
freely shared among individuals, but it may not be republished in
any medium without express written consent from the authors and
advance notification of the editor.
This article describes an
interactive activity illustrating sampling distributions for
means, properties of confidence intervals, properties of
hypothesis testing, confidence intervals for means, and hypothesis
tests for means. Students generate and analyze data and through
simulation explore these concepts. The activity is completed in
three parts. The three parts of the activity can be used in
sequence or they can be used individually as “stand alone”
activities. This allows the educator flexibility in utilizing the
activity. Part I illustrates the sampling distribution of the
sample mean. Part II illustrates confidence intervals for the
population mean. Part III illustrates hypothesis tests for the
population mean. This activity is appropriate for use in an
introductory college or high school AP statistics
course.
Key
words: sampling distribution of a sample mean, confidence interval
for a mean, hypothesis test on a mean, simulation
Objective
After completing the Rectangularity
activity, students will understand:
|
How
to construct and use the sampling distribution for the
sample mean |
|
How
to construct and interpret a confidence interval for a
mean |
|
How
to perform a hypothesis test on a mean |
|
How
to interpret the level of significance of a hypothesis test
(type I error rate) |
|
How
to interpret the p-value of a hypothesis
test |
|
How
to interpret the type II error rate of a hypothesis
test |
|
How
to interpret the power of a hypothesis test |
|
The
relationship between type I and type II error rates and
power |
Materials and
equipment
Each student needs a random number
table or a calculator that generates random numbers, four sticky
notes (for Parts II and III), and a copy of the activity (which
includes statistical guides containing relevant notation,
formulas, and definitions). Included in the student’s version of
the activity is a sheet with a population of 100 rectangles having
different areas. Each square counts as one unit towards a
rectangle’s area.
Time involved
The activity is completed in three
parts. The estimated completion time for each part is one class
period (approximately one hour). The three parts of the activity
can be used in sequence or they can be used individually as “stand
alone” activities. Part I illustrates the sampling distribution of
the sample mean and involves calculations that should be completed
using either a computer software package or a graphing calculator.
Part II illustrates confidence intervals for the population mean.
Part III illustrates hypothesis tests for the population mean.
Activity
description - Part 1: sampling distribution of the sample
mean
To begin, the teacher draws a
histogram of the population distribution of areas on the
whiteboard. The population distribution of areas is skewed to the
right (positively skewed).
Ten groups of two or three
students are formed and the following tasks are assigned to each
group.
|
Select two different random samples of n = 5
rectangles (with replacement) |
|
Select two different random samples of n = 15
rectangles (with replacement) |
|
Select two different random samples of n = 25
rectangles (with replacement) |
Students calculate the average area
of the rectangles for each sample drawn reinforcing the idea that
the sample mean is a random variable.
To complete the data
collection sheet, group results are combined to obtain 20 sample
means for sample sizes n = 5, 15, and 25.
After data collection, students
answer a series of questions based on the means and standard
deviations of the sample means for the different sample sizes.
Students discover properties of the distribution of a sample mean;
namely, (i) the distribution of sample mean values is centered at
the population mean, (ii) the distribution of sample mean values
approaches a normal distribution as the sample size increases,
(iii) the distribution of sample mean values has less variability
than the original population, and (iv) the variability of sample
mean values decreases as n increases.
Activity
description - Part 2:confidence interval for the population
mean
Each student selects a simple random
sample of 25 rectangles (with replacement). Note that the
population of rectangle areas does not have a normal distribution,
but the t confidence interval procedure may be applied in this
case since the sampling distribution of is
approximately normal for samples of size 25. First, each student
uses her sample to construct an 80% confidence interval for the
population mean rectangle area. Each student writes her result on
a sticky note and gives it to the instructor. Each student’s
confidence interval is sketched horizontally on an overhead
transparency leaving one blank horizontal line between intervals.
The resulting overhead transparency displays all of the confidence
intervals constructed by the students in the class.
Students see the results of drawing
repeated samples from the same population and calculating 80%
confidence intervals. Some of the confidence intervals will
contain the population mean (6.26) and some will not. After
graphing the class confidence intervals, their meaning is
discussed. We stress that if we claim that we are 80% confident
that a mean lies within the endpoints of a confidence interval, we
are saying that the endpoints of the confidence interval were
calculated by a method that gives correct results in 80% of all
possible random samples. We are not saying that there is an 80%
chance that a calculated interval contains the population mean.
Students are asked to write a statement explaining how an 80%
level of confidence should be interpreted.
Students are
then asked to construct a 99% confidence interval for the
population mean rectangle area. As above, the class confidence
intervals are graphed and the results are discussed. We stress how
to properly interpret a 99% confidence level and ask students to
write a statement explaining how a 99% level of confidence should
be interpreted. Students are asked to write a statement explaining
how increasing the confidence level from 80% to 99% changed the
width of their confidence intervals.
Activity
description - Part 3:hypothesis test on the population
mean
Each student selects a simple
random sample of 25 rectangles (with replacement) or uses the
simple random sample selected for Part II. Note that the
population of rectangle areas does not have a normal distribution,
but the t test may be applied in this case since the sampling
distribution of is approximately normal for samples of size 25.
In question 1, students use their sample data to perform
two hypothesis tests of Ho:m=9 versus
Ha:m<9
with different levels
of significance. Each student’s data is a different simulated
sample. Since the true population mean rectangle area is
m=6.26, the null hypothesis Ho:m=9 is false. Since
Ho
is false, performing
these tests provides an opportunity to use simulation to
illustrate properties of p-values, type II errors, and power.
The first test of Ho:m=9 versus
Ha:m<9 is
performed using level of significance a=.05. The
instructor draws stems for a stem-and-leaf plot on the whiteboard.
Each student writes her calculated p-value on a sticky note and
places it on the stem-and-leaf plot.
Assuming a class size
of 30 students, the plot will contain 30 calculated p-values. The
p-values are calculated under the assumption that
Ho:m=9 is true (when, in fact,
m=6.26), so the p-values will tend to be small. We
discuss with students that small p-values contradict
Ho.
Some students will not
obtain small p-values. On the stem-and-leaf plot, a cut-off value
is marked at a=.05. Each p-value falling at or below
this cut-off represents a rejection of Ho (a correct decision). Each p-value falling above this
cut-off represents a failure to reject Ho (a type II error). Since 30 samples are taken, and 30
tests are performed, students see that some samples result in a
correct decision and other samples result in an incorrect decision
(type II error). Students are asked to calculate the fraction of
incorrect decisions to obtain a simulated value for b, the probability of a type
II error, and a simulated value for the power = 1-b. An explanation is then
given of how to interpret a type II error rate (and power) in
terms of repeatedly performing the procedure of selecting a
sample, then using the data to test a hypothesis about a
population parameter, when the null hypothesis is false.
The second test is performed using
a=.20. The p-value is the same as for
the first test; however, the type I error rate is increased to
20%. On the stem-and-leaf plot of p-values, a new cut-off is
marked at a=.20. Each
p-value falling at or below this cut-off represents a rejection of
Ho
(a correct decision).
Each p-value falling above this cut-off represents a non-rejection
of Ho
(a type II error).
Students are asked to calculate the fraction of non-rejections of
Ho
out of the 30 tests to
obtain a simulated value for b
and a simulated value for the power. In
examining the class results, students note that an increase in the
type I error rate results in a decrease in the type II error rate
and thus an increase in the simulated power.
In question
2, students use their sample data to perform two hypothesis tests
of Ho:m=6.26 versus
Ha:m¹6.26 with different levels
of significance. Under the assumption that m=6.26,
performing these tests provides an opportunity to illustrate
properties of p-values and type I error.
The first test of
Ho:m=6.26 versus
Ha:m¹6.26 is performed using
a=.05. The second test is performed
using a=.20. As
before, a stem-and-leaf plot of the class p-values is
constructed.
The p-values are calculated under the
assumption that Ho:m=6.26 is true, so the
p-values will tend to be large. We discuss with students that
large p-values do not contradict Ho. Some students will not obtain large p-values. On the
stem-and-leaf plot, a cut-off value is marked at
a.
Each p-value falling
at or below this cut-off represents a rejection of
Ho
(a type I error). Each
p-value falling above this cut-off represents a failure to reject
Ho
(a correct decision).
Since 30 samples are taken, and 30 tests are performed, students
can see that some samples result in a correct decision and other
samples result in an incorrect decision (type I error). For
a=.05
and
a=.20,
students are asked to
calculate the fraction of rejections of Ho out of the 30 tests to obtain a simulated value for
a.
An explanation is then
given of how to interpret a type I error rate in terms of
repeatedly selecting a sample, then using the data to test a
hypothesis about a population parameter, when the null hypothesis
is true.
Teacher
notes
Students work with a population of
100 rectangles, drawing repeated simple random samples (with
replacement). Prior to completing Part I, students should be
familiar with descriptive statistics and probability
distributions. Prior to completing Part II, students should be
familiar with the basic mechanics of how to construct confidence
intervals. Prior to completing Part III, students should be
familiar with the basic mechanics of how to perform hypothesis
tests, including the calculation of test statistics and
p-values.
In this activity, we sample with
replacement to preserve the independence of the sample
observations. When sampling with replacement, it is possible for
the same rectangle to be sampled more than once. If sampled
rectangles are not replaced in the population, then each time a
rectangle is withdrawn the probability of selection for the
remaining rectangles will increase. In practice, we often either
sample with replacement or we sample from a population that is so
large that the withdrawal of successive items changes selection
probabilities negligibly.
In this activity, we used the same
data set to perform two different hypothesis tests at two
different levels of significance. The instructor should emphasize
that the level of significance, null hypothesis, and alternative
hypothesis should be determined prior to data collection. We use
the same data for multiple hypothesis tests to save time.
Technically, we should have collected four separate data sets, one
for each of the four tests conducted.
In addition, the
instructor should stress to students that in reality one would not
know the true value of the population mean m. If the
parameter value were known, then there would be no point in
utilizing sample data to draw an inference about the parameter.
The instructor should stress that we assume that we know the
parameter so that we can investigate the properties of hypothesis
testing under different situations.
Assessment
For Part I: Students should write
about the effect of sampling variability on the center, spread,
and shape of the sampling distribution of the sample mean.
Students should write about the effect of sample size on the shape
and spread of the distribution of the sample mean.
The
following questions can be used to assess student understanding or
as challenge problems for students who complete the activity
early.
1. What happens to the shape of the sampling
distribution of the sample means for this non-normal population as
the sample size increases?
2. How do you think the shape,
mean, and standard deviation of the distribution of the sample
means for samples of size 100 would compare to the shape, mean,
and standard deviation for the samples of size 25 that the class
took?
3. Widgets produced by a machine are known to have a
mean diameter of 12 mm with a standard deviation of 0.31 mm.
Suppose that we take a random sample of 90 widgets and measure
each widget’s diameter. We calculate the mean diameter of the 90
widgets. We repeat this process every day for 365 days so that we
have .
- What would we expect the mean
of the 365 daily means to be?
- What would we expect the
standard deviation of the 365 daily means to be?
- What would we expect the shape
of the histogram of the 365 daily means to be? Why?
- Assuming that the machine
continues to perform as it has in the past, what is the
probability that for the next day the mean diameter of the 90
sampled widgets will be between 11.95 mm and 12.05
mm?
- Why is simply looking at the
mean diameter not enough to say that the machine is producing
widgets with diameters close to the desired 12mm?
For Part II: Students
should be able to explain how to interpret a confidence interval.
Additionally, students should be able to describe the relationship
between the confidence level and the width of a confidence
interval.
The following questions can be used to assess
student understanding or as challenge problems for students who
complete the activity early.
For all of these questions,
assume that the samples are large enough so that the sampling
distribution of the sample mean is approximately normal.
1.
Suppose a simple random sample (SRS) of 20 rectangles has sample
mean, = 7.3, and sample standard
deviation, s = 6.1. Based on the sample, we wish to estimate the
value of the population mean, m.
- What is the point estimate for
m?
- What is the standard deviation
of the point estimate?
- The mean of the sample will
not be exactly equal to the mean of the population, thus there
is error associated with the point estimate. With 95%
confidence, what is the maximum error associated with the point
estimate? (That is, what is the largest possible difference
between and m) This
value is often called the margin of error.
- The margin of error in part
(c) consists of how many estimated standard deviations of
?
2. Suppose the sample
mean, ,
from a SRS of 40
rectangles is used to estimate m.
- How would you expect the
standard deviation of the sampling distribution of the sample
mean of 40 rectangles to compare to the standard deviation of
the sampling distribution of the sample mean of 20 rectangles?
Explain.
- How would you expect the 95%
margin of error for the estimate of m for the 40 rectangles to compare to the 95%
margin of error for the 20 rectangles in the previous problem?
Explain.
- Do you think using a sample
mean from a sample of size 40 will give a more precise estimate
of m
than the sample
mean from a sample of size 20? Explain.
3. In the activity,
you selected a SRS of 25 rectangles and constructed an 80%
confidence interval. Suppose you had selected a SRS of 40
rectangles and constructed an 80% confidence interval. How would
you expect the confidence interval constructed from 40 rectangles
to compare to the confidence interval constructed from 25
rectangles? Explain.
4. For a large population, a 90%
confidence interval for m is found to be 23.5 to 28.9. Why is the
following statement incorrect? “There is a 90% chance that
m
is
between 23.5 and 28.9.”
5. Suppose you select a SRS of size
30 from a large population and find a 95% confidence interval for
m
to be
17.30 to 23.47. Your friend selects a separate SRS of size 30 from
the population and finds a 95% confidence interval for
m
to be
18.64 to 24.81. Which confidence interval is better?
Explain.
For Part III: Students
should be able to explain type I error and type II error in a
specific problem. Additionally, students should be able to
describe the relationship between type I and type II error rates
and power.
The following questions can be used to assess
student understanding or as challenge problems for students who
complete the activity early.
1. A company is trying to
decide whether to buy a new Widget machine that costs $1 million.
It is decided it will be worth buying the machine if there is
overwhelming evidence that the mean number of defective Widgets
will decrease from the current rate of 200 per day.
- State the null and alternative
hypotheses needed to test if the machine should be purchased.
- Describe a type I error in the
context of this problem.
- Describe a type II error in
the context of this problem.
- Argue that a type I error is a
more serious error in this problem.
- For this situation, should the
company run the test at the 1%, 5%, or 10% significance level?
Explain.
2. Explain the fallacy
in reasoning in the following statement. “I wanted to reduce the
chance of committing an error, so I reduced the type I error rate
to .001.”
3. A doctor claims that his patients wait an
average of 10 minutes in his waiting room. A disgruntled patient
claims it is really higher. For a random sample of patients, the
sample mean is 10.8 with a standard deviation of
2.1.
- If the sample consisted of 25
patients, perform the appropriate hypothesis test using a 1%
level of significance (a=.01).
- If the sample consisted of 50
patients, perform the appropriate hypothesis test using a 1%
level of significance (a=.01).
- Give an intuitive
justification for why changing the sample size may result in
changing the conclusion about a null hypothesis.
- In general, what is the
relationship between the sample size and the absolute value of
the test statistic?
- In general, what is the
relationship between the sample size and the p-value? (To answer
this question, refer to the t-curve.)
- What do you think is the
overall relationship between the sample size, the type II error
rate, and the power, when Ho is false?
References
Aliaga, M. and Gunderson,
B. (1999). Interactive Statistics. New Jersey:
Prentice Hall.
Scheaffer, R.,
Gnanadesikan, M., Watkins, A., and Witmer, J. (1996).
Activity-Based Statistics: Instructor
Resources. New York: Key Curriculum Press;
Springer.