Using Simulation-Based Inference in AP Statistics

Josh Tabor, Canyon del Oro High School

The AP Statistics course is designed to mimic a traditional college-level introductory statistics class. Students are expected to use z-tests for proportions, t-tests for means and slopes, and chi-square tests for distributions of categorical data. There are at least three good reasons to incorporate simulation-based inference methods in the AP course, however.[pullquote]Doing these simulations takes time up-front, but helping students understand the logic of inference through simulation saves time in the long-run.[/pullquote]

Using simulation-based methods allows you to introduce the logic of inference on the first day of class. Rather than going on about class procedures, I start my course with an activity that gives students a taste of what lies ahead. The context is about gender discrimination in promotion, with a twist that more females are promoted than expected (5 of 10 females are promoted but only 3 of 15 males). We talk about the “two explanations” for why more females were promoted (discrimination against men!; no discrimination against men—a larger proportion of females being promoted could have happened by chance alone). Once students understand the two explanations, we use bags of beads to represent the population and take random samples (with no peeking) to see what is likely to happen by chance alone. It turns out that promoting 5 females isn’t that surprising in this context.*

*I always make sure that the first example students see has a conclusion where we fail to reject the null hypothesis (even though we don’t use that term yet). I want to make sure students understand the difference between “some evidence” and “convincing evidence.”

A few weeks later, we do an in-class experiment to see if caffeine affects pulse rates. The main goal is to discuss the principles of experimental design, including the purpose of random assignment. I tell my students that we randomly assign treatments to make sure the treatment groups are roughly equivalent at the beginning of the experiment—but that random assignment never creates perfect balance. After we get the results of our experiment, I ask students for the two explanations for why the average increase in pulse rates is higher for the students in the caffeine group (caffeine works!; caffeine doesn’t work and the difference is due to the slight imbalances in the groups created by the random assignment). To see if we can rule out random chance, we use shuffle note cards to see what differences in means are likely to happen by chance alone. Depending on the amount of time left in class, we might also crank up the number of repetitions using an applet or Fathom software.**

**I think it is very important to use hands-on methods for simulation early and often. Otherwise, technology becomes a black box and doesn’t help students understand what is going on. Only after I am convinced that students understand how to perform a simulation by hand will I start with technology. [pullquote]Using simulation-based methods allows you to introduce the logic of inference on the first day of class.[/pullquote]

We continue to practice the logic of inference informally throughout the probability unit. Whenever possible, I try to ask probability questions in an inferential context. For example, instead of asking for the probability of getting 1 or fewer successes in 10 trials of a binomial process with p = 1/4, I tell students about my lack of success playing the monopoly game at McDonald’s. If 1-in-4 game pieces win a prize, should I be concerned if I played 10 times and only won once?

Using simulation-based methods is the best way to teach students the logic of inference and the meaning of a p-value. When we finally make it to the “official” chapters about inference, I continue to make use of simulation-based methods. Before learning the formal details about our first type of test (one-sample z-test for a population proportion), we do a simulation to answer the question. Likewise, I use simulations to introduce many of the other traditional tests (difference in proportions, difference in means, chi-square goodness of fit, slope).[pullquote]I think it is very important to use hands-on methods for simulation early and often. Otherwise, technology becomes a black box… [/pullquote]

Over the years, I have become convinced that students have trouble understanding what is being displayed when I sketch a normal (or t or chi-square) curve as part of a significance test. But when we do a simulation, students can identify “their dot” on the dotplot of simulated results and know exactly what that dot represents. This makes it much easier for students to understand what p-values are all about. After completing the simulation, I transition to the traditional (theory-based) methods by telling students that there are approximations we can use so that we don’t need to do a simulation every time. Doing these simulations takes time up-front, but helping students understand the logic of inference through simulation saves time in the long-run.

Several recent AP exam items have included simulation-based methods. Starting in 2009, there have been free-response questions on the AP exam that illustrate simulation-based methods. Although “simulation-based inference methods” aren’t officially part of the course content, students are expected to use simulations to approximate sampling distributions and understand the logic of inference. Unfortunately, these items have highlighted a general lack of conceptual understanding. Although many students could perform a two-sample z-test for a difference in proportions, most couldn’t perform the “same” test when presented in the context of a simulation (see 2013 #5 on the website below). Other questions that use simulation-based methods include 2009 #6, 2009B #5, and 2010 #6.

Note: All AP Statistics free-response items, along with scoring guidelines and examples of student work can be found at: apcentral.collegeboard.com/apc/members/exam/exam_information/8357.html

Before closing, I should mention a fourth reason to incorporate simulation-based methods in your AP Statistics classes — they are fun! Having students draw beads, shuffle cards, and sort M&M’S® makes class much more engaging that hear me talk all period long.

Simulation-based statistical inference

A blog about teaching introductory statistics with simulation-based inference

Leave a Reply Cancel reply