Matt Beckman, Penn State University
What this is & what this isn’t
This post is intended share some pragmatic thoughts for teaching SBI in a large class, and not necessarily converting your curriculum to the SBI framework. A number of suggestions on the latter have been published in this blog and elsewhere. Besides, my colleague–Kari Lock Morgan–had already done a remarkable job accomplishing that feat in the course to be described before I arrived. What follows are simply remarks about rubber-meets-the-road strategies from teaching an SBI course with 225 students to either capitalize on large class size or at least help navigate some logistical challenges that surface with increased enrollment.
What follows are simply remarks about rubber-meets-the-road strategies from teaching an SBI course with 225 students to either capitalize on large class size or at least help navigate some logistical challenges that surface with increased enrollment.
Karsten Maurer, Assistant Professor of Statistics, The Miami University
As statisticians, we tend to think that if we just have enough data in front of us then we can get at the heart of what is going on in any scenario and many statistics educators want to know what is going on with student learning outcomes from different curricula. So the solution is simple, right? Just collect a bunch of data on our students’ learning outcomes under different curricula and identify the strongest pedagogy. We can even get fancy and toss in some experimental design to structure the application of treatments to our experimental units to support causal conclusions about impacts on learning outcomes. Alright, I am being facetious here. It is never that straight forward. I will admit that this was my first instinct when I set out to do educational research as a graduate student. There are a number of issues that constrain plans for what would be a tidy and straightforward educational experiment: defining the curricular treatments, assigning students to curricula, applying the curricular treatments, measuring learning outcomes.
In order to reinforce the analyses from small-scale educational experiments like ours, we need to find a way to either eliminate or account for the classroom-based dependence structures.
David Diez, OpenIntro
The percentile bootstrap approach has made inroads to introductory statistics courses, sometimes with the incorrect declaration that it can be used without checking any conditions. Unfortunately, the percentile bootstrap performs worse than methods based on the t-distribution for small samples of numerical data. I would wager that the large majority of statisticians proselytize the opposite to be true, and I think this misplaced faith has created a small epidemic.
The percentile bootstrap is nothing new, but its weaknesses remain largely unknown in the community. I find myself wrestling with several considerations whenever I think about this topic.
Kari Lock Morgan, Assistant Professor of Statistics, Penn State University
Computers (or miniature versions such as smart phones) are necessary to do simulation-based inference. How then can we assess knowledge and understanding of these methods without computers? Never fear, this can be done! I personally choose to give exams without technology, despite teaching in a computer classroom once a week, largely to avoid the headache of proctoring a large class with internet access. Here are some general tips I’ve found helpful for assessing SBI without technology:
Much of the understanding to be assessed is NOT specific to SBI. In any given example, calculating a p-value or interval is but one small part of a larger context that often includes scope of inference, defining parameter(s), stating hypotheses, interpreting plot(s) of the data, calculating the statistic, interpreting the p-value or interval in context, and making relevant conclusions. The assessment of this content can be largely independent of whether SBI is used.
In lieu of technology, give pictures of randomization and bootstrap distributions. Eyeballing an interval or p-value from a picture of a bootstrap or randomization distribution can be difficult for students, difficult to grade, and an irrelevant skill to assess. Here are several alternative approaches to get from a picture and observed statistic to a p-value or interval without technology:
Choose examples with obviously small or not small p-values.
Robin Lock, Burry Professor of Statistics, St. Lawrence University
I have the luxury of teaching in a computer classroom with 28 workstations that are embedded in desks with glass tops to show the monitor below the work surface. This setup has several advantages (in addition to enforcing max class size cap of 28) since computing is readily available to use at any point in class, yet I can easily see all of the students, they can see me (no peeking around monitors), and they still have a nice big flat surface to spread out notes, handouts and, occasionally a text book (although many students now use an e-version of the text). I also have software on the instructor’s station (Smart Sync) that shows a thumbnail view of what’s on all student screens. Since the class is setup to use technology whenever needed and appropriate, it is natural to extend this to quizzes and exams, so my students routinely expect to use software as part of those activities.
Ideally I’d like to see what each student produces on the screen and how they interpret the output to make statistical conclusions, but it’s not practical to look over everyone’s shoulder as they work.
Jo Hardin – Pomona College, Claremont, CA
Many of us will agree that using tactile demonstrations is super fun and can also be an excellent way to teach a particular concept. Students engage with the material differently when they can touch, smell, or taste the objects as opposed to only seeing or listening to a demonstration. The SBI blog has had many excellent articles describing in-class tactile simulations, see here and here and here.
However, sometimes the logistical constraints setting up the demonstration take away too much from an already packed 50 minute class session. And those details get even harder with large classes. One of the biggest challenges comes from collecting data or getting results back from the students. Although some classes have sophisticated clickers that make data collection easier, setting up and using clickers is also a logistical challenge (well worth it for using all semester, but not for a one day class demonstration).
The conversation that ensues about the experimental design is incredibly valuable for understanding paired design (and the motivation for the pairing) or survival analysis (and the need for tools to analyze censored data).
When I am attempting to test understanding of carrying out a simulation test about a single proportion, I like to use the following problem, or some variation of it. I’m fond of animals and studies that show that animals are clever, so this study and ones like it, appeals to me.
A chimpanzee named Sarah was the subject in a study of whether chimpanzees can solve problems. Sarah was shown 30-second videos of a human actor struggling with one of several problems (for example, not able to reach bananas hanging from the ceiling). Then Sarah was shown two photographs, one that depicted a solution to the problem (like stepping onto a box) and one that did not match that scenario. Researchers watched Sarah select one of the photos, and they kept track of whether Sarah chose the correct photo depicting a solution to the problem. Sarah chose the correct photo in 7 of 8 scenarios that she was presented. In order to judge whether Sarah understands how to solve problems we will define π to be the probability Sarah will pick the photo of the correct solution.
I don’t let them get away with just claiming that the p-value is some particular number – they have to explain how they know it is that number.
One of the biggest challenges we all face as teachers of statistics is testing students’ statistical knowledge. For example, how do we know if the assessment questions we write are assessing students’ understanding of the statistical material and not some irrelevant construct? How do we know how many questions should be on an assessment to truly see if students are “getting it?” But these questions are only the tip of the iceberg. I know we also grapple with finding interesting contexts and datasets for assessing a particular statistical method; It is hard and time consuming! I have often found an exciting example and dataset only to find that once I start digging into the data, it is beyond the level expected of my introductory statistics students (*Sigh…back to the drawing board). When I was asked to write this blog post, I thought it would be great to share an interesting question so that the task of assessment development isn’t so burdensome for others.
I think it is important to note that I am not just asking them to conduct a randomization test, but am also asking them for interpretations and to think about how study design affects our conclusions. Asking them a multitude of questions for one context also saves me time for coming up with different contexts and reduces cognitive load for my students.
The following data sets have been submitted by members of the SBI listserve.
- Hope College students (in 2003) were wondering if there are any gender differences when it comes to how long people talk on their cell phone. They asked a sample of other students and asked them their gender (0=female, 1=male) and how long their last cell phone call was as measured in seconds (they could find this data recorded on the phone). Dealing with the outlier in this data set makes it interesting. cellphonedata
Alison Gibbs. University of Toronto
In Canada, school curricula differ by province, but most Canadian mathematics curricula include glimpses of statistical thinking, typically in the middle grades. In the province of Ontario, tracing the statistics part of the curriculum through the grades reveals a progression in sophistication of tools for summarizing data, with some scattered mentions of the ideas of informal inference. Students are encouraged to make inferences from their observations, but typically without tools to support their generalizability. Teachers are aware that there are important statistical ideas their students need to understand to do this well. For example, they know that a larger sample size is usually better, but they don’t know how to show their students the effects of sample size on the inferences they can make. In addition, teachers often have the challenge of irregular access to technology and uneven expertise and support. In this context, I recently worked with a group of 15 middle school teachers on an activity that uses multiple random samples to better understand the effect of sample size, with only minimal need for technology.
With the random sampler, students can draw random samples of data from the accumulated databases of questionnaire responses from students from participating countries.