How I teach SBI using R

R Pruim, Calvin College

I’m not writing to convince you that you should use R to teach simulation-based inference (SBI). My goal is to convince you that you can use R for SBI, even with students (and instructors) who have never used R before. Along the way I’ll mention some guiding principles and illustrate some tools that my colleagues Danny Kaplan and Nick Horton and I have assembled in the mosaic R package to make SBI (and EDA and traditional inference procedures) much easier.[pullquote]The biggest key to using R well is to provide a lot of creative opportunity with as little R as possible.[/pullquote]

If you are unfamiliar with R, this post will be too short to give you a good introduction, but I hope it will make you curious enough to find out more. See the references for places to learn more about R and teaching with R.

Less Volume, More Creativity

[pullquote]I find that using the mosaic package keeps the focus on what is important (thinking about where randomization enters and why, and what to do with the distribution once we have it) while hiding distracting details (looping structures, extracting the relevant information from R objects, etc.). [/pullquote]The biggest key to using R well is to provide a lot of creative opportunity with as little R as possible. There can be a lot of ways to skin a cat in R, what you need is a systematic approach that allows you to economically achieve your goals. If technology is the most difficult thing in your course, then your technology is too hard or your questions are too simple (probably both). On the other hand, if your students are able to guess how to do new things in R before you show them, then you will know you are on the right track.

An example: The Lady Tasting Tea

I typically introduce inference on the first day doing a hypothesis test for a proportion “informally”. I don’t use terms like null hypothesis, or p-value, but all the big ideas are introduced.

Often I use some variation on Fisher’s Lady Tasting Tea example because I find that students remember it well and I can refer back to it as a motivating example for the rest of the semester. I find the example works equally well in an Intro Stats course as in my upper level courses, and I like that it connects them in a small way to the history of the discipline.

Let’s see how we can test whether the lady is just guessing given that she correctly identifies 9 of 10 randomly prepared cups of tea. Guessing is like flipping a coin, so mosaic provides a coin flipper:

require(mosaic)    # load the mosaic package
rflip()            # think: random flip

## 
## Flipping 1 coin [ Prob(Heads) = 0.5 ] ...
## 
## H
## 
## Number of Heads: 1 [Proportion Heads: 1]

Since our design calls for 10 cups of tea, we need to flip 10 coins. Fortunately, we don’t have to use rflip() ten times and record the results manually. We just ask for 10 flips.

rflip(n = 10)

## 
## Flipping 10 coins [ Prob(Heads) = 0.5 ] ...
## 
## H H H T T T H H H T
## 
## Number of Heads: 6 [Proportion Heads: 0.6]

Now we need to do this a lot of times. The do() function provides an easy syntax and is clever about how it stores the results, in this case storing the number of flips, number of heads, number of tails, and proportion that are heads.

do(1000) * rflip(n = 10)   # do 1000 times

##    n heads tails prop
## 1 10     6     4  0.6
## 2 10     5     5  0.5
## 3 10     4     6  0.4

## <997 lines omitted>

Of course, we should look at this some other way than by having it all scroll past on our screen. Let’s save the results and look at numerical and graphical summaries.

GuessingLadies <-
  do(1000) * rflip(n = 10)  # simulate 1000 guessing ladies
tally(~heads, data = GuessingLadies)

## 
##   0   1   2   3   4   5   6   7   8   9  10 
##   2   5  50 111 202 241 216 105  57  10   1

histogram(~heads, data = GuessingLadies, width = 1)

Based on our simulation, it appears that someone can get 9 or 10 just by guessing only about 1.1% of the time.

If we want to have a more precise estimate of this proportion (the p-value), we can increase the number of times we do() things. We could also 999 (or 9999) instead of 1000 and include the observed statistic in the null distribution as recommended in the article Tim Hesterberg previews in this SBI blog post.

Grand Tour of Randomization Distributions

If we want to test for a different proportion (e.g., 0.25), we simply tell rflip() the probability of obtaining heads (p) and the size of our sample (n = 100 in the example below):

Proportion.null <- do(1000) * rflip(n = 100, p = 0.25)

##     n heads tails prop
## 1 100    31    69 0.31
## 2 100    25    75 0.25

<998 rows omitted>

Null distributions for permutation tests can be simulated by shuffling the appropriate labels. Here are examples for tests involving the difference between two proportions, the difference between two means, or a linear model:

Galton2 <- Galton %>% group_by(family) %>% sample_n(1)  # one from each family
Null1 <- do(1000) *  diffprop(  homeless ~ shuffle(sex),       data = HELPrct)
Null2 <- do(1000) *  diffmean(  height ~ shuffle(sex),       data = Galton2)
Null3 <- do(1000) *  lm(  age ~ shuffle(substance), data = HELPrct)  # ANOVA
Null4 <- do(1000) *  lm(  father ~ shuffle(mother),    data = Galton2)  # regression

Looking at the first few rows of Null4, we see that do() extracts and records several useful bits of information about a linear model:

##   Intercept  mother sigma r.squared     F
## 1      79.8 -0.1641  2.60   0.02173 4.332
## 2      74.1 -0.0736  2.62   0.00437 0.856
## 3      63.9  0.0849  2.62   0.00582 1.141

We can use the slope (labeled mother), r², or F as our test statistic.

One null distribution that is somewhat more challenging is the null distribution for a test of a single mean. The challenge is to determine how to simulate data with a mean equal to that of the null hypothesis. This can be done non-parametrically by resampling from a shifted version of the empirical distribution. I find it simplest to do this by changing H₀: μ = 98.6 into H₀: μ – 98.6 = 0 and sampling with replacement from the sample:

require(Lock5withR)
Null <- do(1000) * mean( ~ (BodyTemp - 98.6), data = resample(BodyTemp50))

Both the shifting and the introduction of sampling with replacement in the context of hypothesis testing may be more distracting than they are worth since a confidence interval for a mean is generally more useful anyway, but it rounds out our tour.

Bootstrap, too

Generating bootstrap distributions to form confidence intervals is similar. Typically, instead of shuffling a variable we resample the entire data set (possibly within groups) to produce a bootstrap distribution. Here is a bootstrap distribution for the difference in mean heights of men and women using Galton’s data, resampling separately from the males and females:

HeightBySex.boot <- 
  do(1000) * diffmean( height ~ sex, data = resample(Galton, groups = sex) )
histogram(~diffmean, data = HeightBySex.boot)

Simple confidence intervals can be computed using percentiles

cdata(0.95, ~diffmean, data = HeightBySex.boot)  # central 95%

##       low        hi central.p 
##      4.78      5.46      0.95

or the bootstrap standard error, which is just the standard deviation of the bootstrap distribution.

sd(~diffmean, data=HeightBySex.boot)

## [1] 0.166

Eventually, I may show my students how to automate generating the confidence intervals from the boostrap distribution.

confint(HeightBySex.boot, method=c("percentile", "se"))

##       name lower upper level   method estimate margin.of.error  df
## 1 diffmean  4.78  5.46  0.95 quantile       NA              NA  NA
## 2 diffmean  4.80  5.45  0.95   stderr     5.12           0.326 897

Striking a Balance

I find that using the mosaic package keeps the focus on what is important (thinking about where randomization enters and why, and what to do with the distribution once we have it) while hiding distracting details (looping structures, extracting the relevant information from R objects, etc.). There are other R packages (resample and boot, for example) that are faster and use more sophisticated methods, but hide some of the things I want students to see and think about when they are learning how simulation-based inference works.

The R-related cognitive load for students to do SBI using mosaic amounts to

a common template for graphical and numerical summaries, which is needed for non-SBI things as well

goal ( y ~ x , data = mydata )

rflip(), shuffle() and resample() for generating random data
do() for repeating things

These can be combined to handle a wide variety of situations. That’s a lot of SBI creativity for little volume.

Additional Resources

The following resources are provided by Project MOSAIC (Danny Kaplan, Nick Horton, and Randy Pruim) for those interested in learning more about using R to teach statistics.

eCOTS 2014 Workshop: “Effective Teaching using R, RStudio, and the MOSAIC Package”.
Less Volume, More Creativity – an introduction to the mosaic package emphasizing the use of a common template for numerical summaries, graphical summaries, and modeling.
Randomization based Inference [view] [download] – examples of simulation-based inference with mosaic based on the 2013 USCOTS Randomization Bake Off
Start Teaching with R
A Student’s Guide to R
Lock5withR – a companion to Unlocking the Power of Data by Five Locks (with Lana Park)
ISIwithR – a companion to Introduction to Statistical Inference by Nathan Tintle et al. (with Lana Park)

Simulation-based statistical inference

A blog about teaching introductory statistics with simulation-based inference