Moving from learning statistics to discovering statistics

Scott Rifkin

Scott Rifkin, UCSD

I have tried several different approaches to using technology to help students get a better intuitive understanding of statistical concepts. Although statistical software has been used in introductory statistics classes for quite some time, interfaces that facilitate discovery-based learning rather than calculation are much newer. [pullquote]or I could make an applet specifically targeted towards this common question that will let her discover the answer for herself[/pullquote]

My initial attempt was to ask the students write simple routines in R based on templates I provided. I prepared worksheets with explanations of what each variable in the code meant and with suggestions for how they could vary them and plot the results. I thought that simulating null or bootstrap distributions themselves would give them strong insight into these concepts. Unfortunately, for most of my students, my hopes were derailed by few obstacles. Thinking that a menu-driven interface would be more familiar to the students, I had them use an R GUI that proved to be unstable in subtle ways – problems that I did not experience but that metastasized in the hands of 150 impatient new users. After these problem cropped up, I ditched the GUI and relied wholly on the command line. But the damage was done – students didn’t trust the software, didn’t differentiate between the GUI and the underlying R program, and many had never used and didn’t understand how to use a command-line interface. Instead of helping, the software was a barrier to understanding simply because it competed for study time and emphasized “getting things to work” rather than “using it to understand.” It was clear that the students needed something easier – something more plug-and-play.

R was too complicated. Minitab would not (easily) do simulation. I shuddered when a colleague suggested Excel. I shied away from asking them to buy software specifically for this course. Then I discovered applets.

Applets are an excellent learning tool and, for motivated students, a wonderful discovery tool. Most applets only have a few buttons or boxes, so it is usually fairly obvious how to use them. In fact, they can be too easy to use. I’ve had students comment that the applets are nice and all, but they didn’t really believe me that they could use them for statistical tests out in the real world.

I’ve primarily relied on the Rossman-Chance/ISI collection for hypothesis testing and the StatKey set for bootstrap confidence intervals. There are lots of other ones around the web that are good for illustrating specific concepts. I’ve been tempted to pick and choose from these but have resisted for three reasons: (1) each new interface is another thing for students to struggle with; (2) many of these applets are written in Java which lately has been too difficult to get to reliably run on 150+ computers; and (3) sites disappear or go down unexpectedly, confounding lesson planning and wreaking havoc with assignments.

Moreover, a new package for R called shiny makes it quite easy to make nice-looking and easy-to-use applets out of R code. My students were struggling with Type II error in particular, and I found an applet that I thought would help them understand. Unfortunately, it was written in Java – I had trouble convincing my computer to allow it to run, and I knew that I would have to field too many emails from students who couldn’t get it to work. I had wanted to learn how to use shiny and three hours later, after modifying one of their examples, I had my own Type I-Type II-Power applet that I could customize however I wanted. I could tweak it based on feedback I got from my students and add new features to address misconceptions. Shortly thereafter I added another one, this time because my students were struggling to understand the similarities and differences between null, sampling, and bootstrap distributions. A student recently mentioned that she did not understand why the standard error of the mean should necessarily be less than the population standard deviation (except for n=1). I could explain it to her in words (and have in class); I could tell her (and will in a week) a formula and point out the sqrt(n) in the denominator (but I think that leads to an understanding that it is less but not why it is less), or I could make an applet specifically targeted towards this common question that will let her discover the answer for herself and that can be used by similarly puzzled students in my current and future courses.

Targeted, interactive, accessible programs like applets are a key tool in transforming statistics learning from memorization to exploration and discovery. However, students still need to be encouraged and taught how to use them that way. Lately I’ve asked students to write down predictions about what will happen if they change the number in a box or move a slider. If they write it down, then they are committed to an answer and have to face whether they were right or wrong. If they don’t write it down, it’s too easy to look at the result and convince themselves that they expected it. Ten minutes playing with an applet will give them more intuition for how various parameters affect distributions than anything else. Once they have this intuition – once they have internalized the patterns – it is a much shorter step to understanding why the patterns come about.

I’d love to be able to move my course away from its current lecture/lab/homework format to one where the students spend the bulk of their time discovering statistical concepts for themselves – guided by targeted applets and 5-15 minute videos that I would record – and then applying them to real world problems. My role would be as a facilitator: tweaking the
applets and the exercises or making new ones to expose specific misconceptions
revealed through conversations with students. Many components for such a course are already out there – applets and exercises and explorations and cases – and the
technology is there to make this transition possible even for a large class.

One thought on “Moving from learning statistics to discovering statistics”

Homer White December 10, 2014 at 11:01 am

I agree about the userfulness of apps, especially Shiny apps, which are relatively easy to write and to customize if one knows some R. My elementary students tend to need simulation apps that start off “slowy”, rather than showing a simulated approximation of a distribution “all at once” (see. e.g., http://homer.shinyapps.io/SlowGoodness). I wonder if you might have had better luck with R in class if you had gone for the command line right away, rather than trying one of those GUIs. Package mosaic has some nice tools to permit students to design simulations with almost no programming (see, for example, the mosaic do() function)).

Reply ↓

Simulation-based statistical inference

A blog about teaching introductory statistics with simulation-based inference

One thought on “Moving from learning statistics to discovering statistics”

Leave a Reply Cancel reply