 align=middle width=80% Student's version HTML Format Word Format &NBSP Computer Supplements Rolling_Dice.MPJ

Counting Eights: A First Activity in the Study and Interpretation of Probability

Department of Mathematics
Kenyon College
Gambier, OH 43022

Statistics Teaching and Resource Library, November 6, 2001

© 2001 by Bradley A. Hartlaub and Brian D. Jones, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

Students explore the definition and interpretations of the probability of an event by investigating the long run proportion of times a sum of 8 is obtained when two balanced dice are rolled repeatedly. Making use of hand calculations, computer simulations, and descriptive techniques, students encounter the laws of large numbers in a familiar setting. By working through the exercises, students will gain a deeper understanding of the qualitative and quantitative relationships between theoretical probability and long run relative frequency. Particularly, students investigate the proximity of the relative frequency of an event to its probability and conclude, from data, that the dispersion of the relative frequency diminishes on the order .

Key words: probability, law of large numbers, simulation, estimation

## Objectives

After doing the in-class activities and the student investigations and questions, students should understand the following: In repeated independent and identical random trials, the long run relative frequency of an event approaches the true probability of that event. The relative frequency of an event in repeated random trials is itself a random quantity, having an inherent distribution. In n repeated independent and identical random trials, the center of the relative frequency’s distribution is approximately the true probability of that event, while the spread of this distribution decreases as n grows. Even though all students simulate completely different sequences (of dice rolls), each student’s sequence of relative frequencies approaches the same theoretical limit.

Description of Activity

In-Class Activity.  Provide for each student a pair of standard six-sided dice. At the beginning of class ask each student to roll their pair of dice and record the sum of the two dice. Count the number of students that got a sum of 8 and divide this number by the number of students in class to calculate the proportion of students in class that obtained a sum of 8. Discuss with students the meaning of this proportion; i.e., is the value an observation of a random variable or a fixed parameter or an estimate of a fixed parameter? If the value is an estimate, how might this estimate be improved?

For the next phase of this activity each student should have access to a statistics software package or calculator that can be used to simulate the roll of a pair of dice. Each student should simulate the rolling of a pair of dice 1000 times. A Minitab based activity sheet for students is provided for the instructor to download and distribute (see Appendix C). Ultimately, the variable of interest is Y, the proportion of 8’s so far, formally defined as Students should record this variable for n = 10, 25, 50, 100, 500, and 1000.

The instructor should draw six identical and parallel axes on a whiteboard, with the axes labeled n=10, n=25, n=50, n=100, n=500, and n=1000 (see Figure 3 for an example). As the students finish their simulations of 1000 rolls, each should come to the board and place a large dot at the appropriate place on each scale for their six values of Y. After all students have graphed their data, the result will be six side by side dotplots illustrating the distribution of the variable Y as a function of the number of rolls n. The dotplots may be easier to read and less prone to error if students convert their proportions to percents before placing dots on the board.

In addition to leading classroom discussions about the appearance of the six dotplots, the instructor should enter the six Y values (in decimal form) for each student in a spreadsheet so that students may have access to them for take-home investigations.

Example Results. The results of a class of 40 students were simulated using Minitab. A Minitab macro that can be used to perform these simulations is provided in Appendix A. Appendix B contains a Minitab project with our simulated results for all 40 students. Figures 1 and 2 show sample graphs of Y versus n. The value of Y in each case converges, as n gets large, to the true probability of an 8, p8, or about 13.89%. Figure 1. Plot of Y (Sim1) versus n (Trial) for student 1. Figure 2. Plot of Y (Sim2) versus n (Trial) for student 2.

Figure 3 illustrates side by side dotplots for the values of Y at n = 10, 25, 50, 100, 500, and 1000. Figure 4 provides parallel boxplots for the values of Y at n = 10, 25, 50, 100, 500, and 1000. Figure 3. Sample side by side dotplots for a class of 40 students. Figure 4. Sample parallel boxplots for a class of 40 students.

Assessment

The natural criteria for assessing a student’s understanding of the key concepts of this activity is to compare their level of knowledge with the objectives put forth at the beginning of this paper. Although four objectives were listed, two of these objectives are indispensable as a minimum of student understanding — in repeated independent and identical random trials, the long run relative frequency of an event approaches the true probability of that event; and the relative frequency of an event in repeated random trials is itself a random quantity, having an inherent distribution.

Take-Home Activities.  We try to reinforce these concepts by having students, on their own and outside of class, repeat the in-class activity but with a different event of interest. For example, make the objective to estimate the probability that the maximum of the two dice is 5, or the probability that the sum of the two dice is greater than 8, etc. Alternatively, we have asked students to estimate the probability of getting a head when flipping, spinning, or tipping coins using the convergence of relative frequencies. Students are often surprised that the probabilities can differ for flipping, spinning, and tipping a coin, depending on the coin’s minting date.

Finally, students might be asked to estimate the probability of an event that does not lend itself to theoretical computation or computer simulation. For example, we have asked students to estimate the probability that a thumb tack will land point down when tossed on a desk. The students obviously will not be able to compute the theoretical probability of this event or run a computer simulation in this setting. Therefore, students are required to apply the long run relative frequency interpretation of probability.

Quiz and Exam Questions.  These concepts are conducive to essay format questions. Below are a few examples taken from past introductory statistics courses.

1. Define the law of large numbers and give a practical example of this law in action. (Use an example that we have not discussed in class.)
2. In a Bernoulli population let pS and pF be the probabilities of success and failure, respectively. Let X be the number of successes among n observations from this population. What random variable is a natural estimator of the difference parameter pS - pF? Justify your answer.
3. This is a simulation activity to check your intuition and your random number generator.
1. Generate two sets of 100 numbers from the interval (0, 1) and store them in two separate columns, say C1 and C2.
2. Create a new column, say C3, for the differences between the random numbers
3. What does the distribution of differences look like? Is it symmetric? Is the center of the distribution of differences located where you expected it to be?
4. Create a new column C4, that contains 1 if C2>=C1 and 0 if C2<C1.
5. Find the sum of the values in C4. What does this statistic tell you? Explain how you would use this statistic to estimate the chance that a given number in the second column is greater than the corresponding number in the first column?
6. Create a running total of the number of times the second number is larger than the first number and store this total in C5.
7. Create a new column, C6, containing the trial number. That is, simply enter the numbers 1 to100 in column C6.
8. Use columns C5 and C6 to create cumulative percentages and store them in C7.
9. Create a scatterplot of the cumulative percentage of times the second number was greater than the first number against trial number.
10. Describe the obvious pattern on your scatterplot. How is this pattern related to the law of large numbers?

Teacher notes

Refer to the Investigations and Questions for Students Section in the student version of the activity.

Question 3. You may want to have students show you their values of p8 for affirmation before they work on further.

Question 4. This problem is intended as a springboard to the formal definition and discussion (if desired) of the weak law of large numbers.

Question 6. This problem is intended as a springboard to the definition and discussion (if desired) of the strong law of large numbers.

Question 7. You may want to point out that formally A possible extension to this problem is to have students rework (a)-(c) when Y and S are expressed in percents rather than decimals.

Appendix A

The file, Rolling_Dice.MTB, contains a Minitab macro which will simulate the rolling of 1000 pairs of dice and calculate the proportion of 8’s after n rolls. The code in this macro, a simple text file, is listed below. To execute this macro, simply select File > Other Files > Run an Exec from the Minitab menus. To conduct the simulation for more than one student, just change the number of times to execute to your desired value. However, the default limit for recent versions of Minitab worksheets is set to a size of 100,000 cells. One strategy that can be used to save the simulation results for different students in different columns, is to execute the macro for 20 students at a time and then unstack the simulations into separate columns.

#Macro for 1000 Dice Rolls
name c1 'Trial'
Set c1
1( 1 : 1000 / 1 )1
End.
name c2 'Red Die'
Random 1000 c2;
Integer 1 6.
name c3 'Green Die'
Random 1000 c3;
Integer 1 6.
name c4 'Sum of Dice'
Let c4 = c2+c3
name c5 'Sum=8?'
Let c5 = (c4=8)
name c6 '8s so far'
Let c6 = PARS(c5)
name c7 'Proportion of 8s so far'
Let c7 = c6/c1
stack c7 c8 c8

Appendix B

The Minitab project file, Rolling_Dice.MPJ, contains our simulated results for all 40 students.

Appendix C

The student's version of the activity, Handout.doc, contains a Microsoft Word document that students can use as a step by step in-class activity guide for simulating the rolling of 1000 pairs of dice in Minitab and calculating the proportion of 8’s as a function of the number of rolls n.