Bradley A.
Hartlaub and Brian D.
Jones
Department of Mathematics
Kenyon
College
Gambier, OH 43022
Statistics Teaching
and Resource Library, November 6, 2001
© 2001 by Bradley A.
Hartlaub and Brian D.
Jones, all rights reserved. This text may be freely
shared among individuals, but it may not be republished in any
medium without express written consent from the author and advance
notification of the editor.
Students explore the definition and
interpretations of the probability of an event by investigating
the long run proportion of times a sum of 8 is obtained when two
balanced dice are rolled repeatedly. Making use of hand
calculations, computer simulations, and descriptive techniques,
students encounter the laws of large numbers in a familiar
setting. By working through the exercises, students will gain a
deeper understanding of the qualitative and quantitative
relationships between theoretical probability and long run
relative frequency. Particularly, students investigate the
proximity of the relative frequency of an event to its probability
and conclude, from data, that the dispersion of the relative
frequency diminishes on the order
.
Key words: probability, law of
large numbers, simulation, estimation
Objectives
After doing
the in-class activities and the student investigations and
questions, students should understand the
following:
 |
In repeated independent and identical random
trials, the long run relative frequency of an event
approaches the true probability of that
event. |
 |
The relative frequency of an event in repeated
random trials is itself a random quantity, having an
inherent
distribution.
|
 |
In n repeated independent and identical
random trials, the center of the relative frequency’s
distribution is approximately the true probability of
that event, while the spread of this distribution
decreases as n grows. |
 |
Even though all students simulate completely
different sequences (of dice rolls), each student’s
sequence of relative frequencies approaches the same
theoretical limit. | |
Description of
Activity
In-Class Activity.
Provide for each student a pair of standard
six-sided dice. At the beginning of class ask each student to roll
their pair of dice and record the sum of the two dice. Count the
number of students that got a sum of 8 and divide this number by
the number of students in class to calculate the proportion of
students in class that obtained a sum of 8. Discuss with students
the meaning of this proportion; i.e., is the value an observation
of a random variable or a fixed parameter or an estimate of a
fixed parameter? If the value is an estimate, how might this
estimate be improved?
For the next phase of this activity
each student should have access to a statistics software package
or calculator that can be used to simulate the roll of a pair of
dice. Each student should simulate the rolling of a pair of dice
1000 times. A Minitab based activity sheet for students is
provided for the instructor to download and distribute (see
Appendix C). Ultimately, the variable of interest is Y, the
proportion of 8’s so far, formally defined as

Students should record this variable
for n = 10, 25, 50, 100, 500, and 1000.
The
instructor should draw six identical and parallel axes on a
whiteboard, with the axes labeled n=10, n=25,
n=50, n=100, n=500, and n=1000 (see
Figure 3 for an example). As the students finish their simulations
of 1000 rolls, each should come to the board and place a large dot
at the appropriate place on each scale for their six values of
Y. After all students have graphed their data, the result
will be six side by side dotplots illustrating the distribution of
the variable Y as a function of the number of rolls
n. The dotplots may be easier to read and less prone to
error if students convert their proportions to percents before
placing dots on the board.
In addition to leading classroom
discussions about the appearance of the six dotplots, the
instructor should enter the six Y values (in decimal form)
for each student in a spreadsheet so that students may have access
to them for take-home investigations.
Example
Results. The results of a class of 40 students were
simulated using Minitab. A Minitab macro that can be used to
perform these simulations is provided in Appendix A. Appendix B
contains a Minitab project with our simulated results for all 40
students. Figures 1 and 2 show sample graphs of Y versus
n. The value of Y in each case converges, as
n gets large, to the true probability of an 8,
p8, or about 13.89%.

Figure 1. Plot of Y (Sim1) versus n
(Trial) for student 1.

Figure 2. Plot of Y (Sim2) versus n
(Trial) for student 2.
Figure 3 illustrates side by
side dotplots for the values of Y at n = 10, 25, 50,
100, 500, and 1000. Figure 4 provides parallel boxplots for the
values of Y at n = 10, 25, 50, 100, 500, and 1000.

Figure 3. Sample side by side dotplots for a class of
40 students.

Figure 4. Sample parallel boxplots for a class of 40
students.
Assessment
The natural criteria for assessing a
student’s understanding of the key concepts of this activity is to
compare their level of knowledge with the objectives put forth at
the beginning of this paper. Although four objectives were listed,
two of these objectives are indispensable as a minimum of student
understanding — in repeated independent and identical random
trials, the long run relative frequency of an event approaches the
true probability of that event; and the relative frequency of an
event in repeated random trials is itself a random quantity,
having an inherent distribution.
Take-Home Activities. We try to
reinforce these concepts by having students, on their own and
outside of class, repeat the in-class activity but with a
different event of interest. For example, make the objective to
estimate the probability that the maximum of the two dice is 5, or
the probability that the sum of the two dice is greater than 8,
etc. Alternatively, we have asked students to estimate the
probability of getting a head when flipping, spinning, or tipping
coins using the convergence of relative frequencies. Students are
often surprised that the probabilities can differ for flipping,
spinning, and tipping a coin, depending on the coin’s minting
date.
Finally, students might be asked to estimate the
probability of an event that does not lend itself to theoretical
computation or computer simulation. For example, we have asked
students to estimate the probability that a thumb tack will land
point down when tossed on a desk. The students obviously will not
be able to compute the theoretical probability of this event or
run a computer simulation in this setting. Therefore, students are
required to apply the long run relative frequency interpretation
of probability.
Quiz and Exam
Questions. These concepts are conducive to essay
format questions. Below are a few examples taken from past
introductory statistics courses.
- Define the law of large numbers
and give a practical example of this law in action. (Use an
example that we have not discussed in class.)
- In a Bernoulli population let
pS and pF be the
probabilities of success and failure, respectively. Let X be the
number of successes among n observations from this population.
What random variable is a natural estimator of the difference
parameter pS - pF? Justify
your answer.
- This is a simulation activity to
check your intuition and your random number
generator.
- Generate two sets of 100
numbers from the interval (0, 1) and store them in two
separate columns, say C1 and C2.
- Create a new column, say C3,
for the differences between the random numbers
- What does the distribution of
differences look like? Is it symmetric? Is the center of the
distribution of differences located where you expected it to
be?
- Create a new column C4, that
contains 1 if C2>=C1 and 0 if C2<C1.
- Find the sum of the values in
C4. What does this statistic tell you? Explain how you would
use this statistic to estimate the chance that a given number
in the second column is greater than the corresponding number
in the first column?
- Create a running total of the
number of times the second number is larger than the first
number and store this total in C5.
- Create a new column, C6,
containing the trial number. That is, simply enter the numbers
1 to100 in column C6.
- Use columns C5 and C6 to create
cumulative percentages and store them in C7.
- Create a scatterplot of the
cumulative percentage of times the second number was greater
than the first number against trial number.
- Describe the obvious pattern on
your scatterplot. How is this pattern related to the law of
large numbers?
Teacher
notes
Refer to the Investigations and Questions
for Students Section in the student version of the
activity.
Question 3.
You may want to have students show you their values of
p8 for affirmation before they work on
further.
Question 4. This problem is intended as a
springboard to the formal definition and discussion (if desired)
of the weak law of large numbers.
Question 6. This problem
is intended as a springboard to the definition and discussion (if
desired) of the strong law of large numbers.
Question 7.
You may want to point out that formally

A possible extension to this problem
is to have students rework (a)-(c) when Y and S are
expressed in percents rather than
decimals.
Appendix A
The file, Rolling_Dice.MTB, contains a Minitab
macro which will simulate the rolling of 1000 pairs of dice and
calculate the proportion of 8’s after n rolls. The code in
this macro, a simple text file, is listed below. To execute this
macro, simply select File > Other Files > Run an Exec from
the Minitab menus. To conduct the simulation for more than one
student, just change the number of times to execute to your
desired value. However, the default limit for recent versions of
Minitab worksheets is set to a size of 100,000 cells. One strategy
that can be used to save the simulation results for different
students in different columns, is to execute the macro for 20
students at a time and then unstack the simulations into separate
columns.
#Macro for 1000 Dice Rolls
name c1
'Trial'
Set c1
1( 1 : 1000 / 1 )1
End.
name c2 'Red
Die'
Random 1000 c2;
Integer 1 6.
name c3 'Green
Die'
Random 1000 c3;
Integer 1 6.
name c4 'Sum of
Dice'
Let c4 = c2+c3
name c5 'Sum=8?'
Let c5 =
(c4=8)
name c6 '8s so far'
Let c6 = PARS(c5)
name c7
'Proportion of 8s so far'
Let c7 = c6/c1
stack c7 c8
c8
Appendix B
The Minitab project file, Rolling_Dice.MPJ,
contains our simulated results for all 40 students.
Appendix C
The student's version of the activity, Handout.doc,
contains a Microsoft Word document that students can use as a step
by step in-class activity guide for simulating the rolling of 1000
pairs of dice in Minitab and calculating the proportion of 8’s as
a function of the number of rolls n.