Trent D. Buskirk *and Linda J. Young**
*Department of Mathematics and
Statistics
University of Nebraska-Lincoln
Lincoln, NE
68588-0323
**Department of Biometry
University of
Nebraska-Lincoln
Lincoln, NE 68583-0712
Statistics Teaching
and Resource Library, August 29, 2001
© 2001 by Trent D. Buskirk and Linda J.
Young, all rights reserved. This
text may be freely shared among individuals, but it may not be
republished in any medium without express written consent from the
author and advance notification of the editor.
This activity is an advanced version
of the “Keep your eyes on the ball” activity by Bereska, et al.
(1999). Students should gain experience with differentiating
between independent and dependent variables, using linear
regression to describe the relationship between these variables,
and drawing inference about the parameters of the population
regression line. Each group of students collects data on the
rebound heights of a ball dropped multiple times from each of
several different heights. By plotting the data, students quickly
recognize the linear relationship. After obtaining the least
squares estimate of the population regression line, students can
set confidence intervals or test hypotheses on the parameters.
Predictions of rebound length can be made for new values of the
drop height as well. Data from different groups can be used to
test for equality of the intercepts and slopes. By focusing on a
particular drop height and multiple types of balls, one can also
introduce the concept of analysis of variance.
Key words: Linear regression,
independent variable, dependent variables, analysis of
variance
Materials
Each group of 3-5 students needs a
measuring device (preferably a tape measure), a ball that will
rebound when dropped, and graph paper. A pool of balls, such as
super balls, tennis balls, racquetballs, basketballs, and soccer
balls, should be available. It is better to have more balls
available than groups as students always like to have a choice!
Optional materials are chalk, post-it notes, and a measuring
stick.
Time
A class period of 50 or 75 minutes
is sufficient for collecting the data, plotting the data, and
estimating the regression line. Additional class periods could be
used to complete further analyses, such as setting confidence
intervals on the parameters or testing equality of the regression
lines obtained by different groups.
Objective
The objective of this activity is to
estimate the population regression line relating the rebound
height of a ball to the height from which it is dropped and to
draw inferences using the fitted regression line. A variation of
this activity allows students to use the analysis of variance to
determine whether there is a difference in the mean rebound height
of different balls dropped from a common height.
Description of
Activity
This advanced version of the “Keep
you eyes on the ball” activity by Bereska, et al. (1999) offers
students an opportunity to explore the relationship between a
ball’s rebound height and the height from which it is initially
dropped. By setting their own drop heights and by collecting their
own data, groups will gain experience with independent and
dependent variables. Students will also use linear regression to
draw inferences and to make predictions based on their fitted
lines. For this activity, rebound height is defined to be the
highest level of ascent that the ball makes after its impact with
the floor.
To collect the regression data each group
should drop its selected ball from each of ten heights five times.
These numbers can be varied according to course time constraints.
Students should determine the (ten) drop heights for the ball that
their group has selected (one ball should be used per group).
During a 50-minute class period, for instance, students may drop a
basketball five times at each of ten heights. Actual student data
are included after the prototype activity in the Example
Student Output section.
To better understand the
nature of the relationship between the drop and rebound heights,
students should first plot their data. On this plot, the students
should be able to see that a line is the best descriptor of this
relationship. Students should also be able to identify outliers on
this plot. Once identified the group should be able to investigate
the nature of any outlying observations. Sometimes these outliers
end up being the first observations recorded for a particular drop
height and may simply be a function of the inexperience of the
rebound height recorder. Students are then asked to use their data
to fit a linear regression line and to use it to make predictions
about the rebound heights of a ball dropped from a drop height for
which no data were collected. Students are encouraged to select
their own heights and should avoid extrapolation. Students are
also asked to interpret the regression slope and intercept within
the context of this activity as well as to comment on the scope of
inference for their regression line.
Assessment
Below is a sample exam question to
test an understanding of the basic concepts associated with linear
regression:
POSSIBLE EXAM QUESTION: OFFICE-TEMPS Inc. wants
to screen applicants for basic typing skills using a timed test.
Applicants are required to type as many words (in the order in
which they appear on a uniform list) as possible in the prescribed
time. The allowable times range from 10 to 90 seconds. Data
collected from all applicants interviewing last week are listed
below:
Time
(Sec) |
10 |
10 |
10 |
20 |
20 |
20 |
60 |
60 |
60 |
90 |
90 |
90 |
# of
words |
18.5 |
19 |
17.75 |
29 |
29.5 |
32 |
75 |
60.5 |
53.25 |
80.5 |
100 |
93.25 |
- Identify the independent and
dependent variables in this study.
- Assuming that the assumptions of
linear regression hold, fit a regression line to the data.
Interpret the estimated slope and intercept in the context of
this study.
- Is the regression intercept
significantly different from zero? Justify your
answer.
- Compute a 95% prediction interval
for the number of words typed in 40 seconds and interpret it in
the context of this study.
- Compute a 90% confidence interval
for the mean number of words that can be typed in 40 seconds and
interpret it in the context of this study.
- Clearly explain why the intervals
in (d) and (e) are NOT the same in the context of the
problem.
Teacher
notes
Students often confuse dependent and
independent variables and have difficulty grasping the concept of
a population regression line that is being estimated by fitting a
linear regression line. In addition, it is often difficult to find
data that allow a careful consideration of the assumptions
underlying regression. This activity was designed to permit the
students to look at the underlying assumptions of regression and
to estimate the population regression line. Clearly, taking a
little more data will lead to changes in the estimated population
regression line even though the population line remains unchanged.
In addition, the differences in a confidence interval on the mean
rebound height at a given drop height and a prediction interval
for a new observation at a given drop height become more real to
the students.
This activity will work best if students are
arranged into groups consisting of 2 to 4 members. It will be
difficult to complete the data collection if students work alone.
A group of size three is optimal in that it allows one student to
drop the ball, a second to observe the rebound height, and a third
to record the data. If the groups are larger than three,
additional observers on the rebound height can be helpful.
The most challenging part of the data collection is
accurately recording the rebound heights. The rebound-height
observer(s) must be eye level with the rebound height to record it
accurately. Students should practice dropping the ball and
recording the rebound heights. Some students will force the ball
downward resulting in anomalous rebound heights. Other students
will learn that they are better rebound recorders than they are
droppers. Practice time should be allocated so that groups can
assign duties, determine the range of drop heights to be used, and
practice dropping the ball and recording its rebound height.
An additional concept that may be further discussed within
the context of this experiment and its subsequent analysis is the
idea of outlying or influential observations. Sometimes outliers
are observed. This could cause the students to question the
assumption of normality. Often students can identify reasons for
the outlier. For example, “It was the first drop.”
To
evaluate the assumption of equality of variances for the rebound
heights at varying levels of the drop height, students can use the
5 rebound heights at each drop height to plot the sample standard
deviation versus the drop height. Although five observations
provide limited insight, this may help identify patterns in
measurement error or groups with potential outliers. In addition
to checking the homoscedasticity assumption, students can also use
normal probability plots or residual plots to check violations of
the normality assumption or to identify outliers.
The drop
height should be a good predictor of rebound height so a
discussion of high R2 values may be appropriate as well
as a discussion of the cloud-like pattern that one would expect to
see in the plot of residuals versus independent variable. Groups
can compare their regression lines with other groups for
particular balls of interest.
Students should generally
conclude that inference could only be drawn to the ball that was
dropped, to the particular surface on which it was dropped, and
within the range of drop heights used to construct the line.
Because the true relationship between drop height and rebound
height is quadratic, the intercept is usually significantly
different from zero. Thus, the problems associated with
extrapolation are clear when interpreting the estimated intercept.
This also serves as clear example of a model that is useful for
the range of observed data, but is not the true underlying model.
It could be instructive to ask students to predict the rebound
height for a ball that is dropped from well above any observed
drop heights, say 200 inches from the ground, based on their
fitted regression line. Students should realize that inference
ought to be restricted to the person doing the dropping or
observing the rebound height unless this responsibility was
rotated within the group.
Depending on the level of the
course, subsequent class periods could be used to test for
equality of the regression lines from two balls of the same type,
or two balls of different types.
An extension of this
activity is to use the data for a given drop height to test for
differences in the mean rebound heights of different kinds of
balls. If the data are kept from the regression activity and an
effort is made to have at least one common height for all groups,
it should not be necessary to collect more data.
Acknowledgements
This is Journal Paper No. 13310 of
the Nebraska Agricultural Research Division, University of
Nebraska at Lincoln. Research was supported in part by University
of Nebraska Agricultural Experiment Station Project
NEB-23-001.
References
Bereska, C., Bolster, C. H.,
Bolster, L. C., and Scheaffer, R. (1999). EQL Investigation 15:
Keep your eyes on the ball. Exploring Statistics in the Elementary
Grades: Dale Seymour Publications: White Plains, New
York.
Editor's
note: Before 11-6-01, the "student's version" of an
activity was called the "prototype".