Kie Van Ivanky Saputra, Kaprodi Matematika Terapan
We met Buzz and Doris when we wanted to learn statistics. They are dolphins who were trying to get some rewards if they were able to communicate while we were learning to statistically test if they were communicating. In 16 trials, Doris gave signs to Buzz as to which button to press and it turned out Buzz pushed the correct button in 15 out of the 16 trials. We, still not convinced that they were communicating, assumed that it was just a lucky day for them and tried to simulate 15 successes in 16 trials with tossing 16 coins to see whether or not we can get 15 heads out of 16 tosses. The first time we only get 9 heads out of 16, the second time we get 8 heads and we continued this until we had done 100 repetitions. It turned out we could only get a maximum of 12 heads out of 16 tosses. Let’s continue the repetitions until 1000 and out of 1000 there was only 1 simulation that gave us 15 heads out of 16 tosses. It seems impossible now that the dolphins had just had a lucky day. They had something more than just guessing which button to press. Since that day, we know more about a p-value, and null hypothesis.
The above was my first experience in teaching statistics with simulation-based inference.
Karsten Luebke, FOM, Germany
This post is based on joined work with Oliver Gansser, Matthias Gehrke, Bianca Krol, and Norman Markgraf.
The FOM is a private University of Applied Science in Germany for people studying while working. We are offering several, mainly economic related bachelor and master study programs in 29 study centers across Germany. The size of the courses with statistical content varies: from 15 to 150 students – or even more.
We used a relaunch of our BA degree in Summer 2016 to rethink and rebuild our curriculum in the different introductory statistics courses.
Matt Beckman, Penn State University
What this is & what this isn’t
This post is intended share some pragmatic thoughts for teaching SBI in a large class, and not necessarily converting your curriculum to the SBI framework. A number of suggestions on the latter have been published in this blog and elsewhere. Besides, my colleague–Kari Lock Morgan–had already done a remarkable job accomplishing that feat in the course to be described before I arrived. What follows are simply remarks about rubber-meets-the-road strategies from teaching an SBI course with 225 students to either capitalize on large class size or at least help navigate some logistical challenges that surface with increased enrollment.
What follows are simply remarks about rubber-meets-the-road strategies from teaching an SBI course with 225 students to either capitalize on large class size or at least help navigate some logistical challenges that surface with increased enrollment.
Karsten Maurer, Assistant Professor of Statistics, The Miami University
As statisticians, we tend to think that if we just have enough data in front of us then we can get at the heart of what is going on in any scenario and many statistics educators want to know what is going on with student learning outcomes from different curricula. So the solution is simple, right? Just collect a bunch of data on our students’ learning outcomes under different curricula and identify the strongest pedagogy. We can even get fancy and toss in some experimental design to structure the application of treatments to our experimental units to support causal conclusions about impacts on learning outcomes. Alright, I am being facetious here. It is never that straight forward. I will admit that this was my first instinct when I set out to do educational research as a graduate student. There are a number of issues that constrain plans for what would be a tidy and straightforward educational experiment: defining the curricular treatments, assigning students to curricula, applying the curricular treatments, measuring learning outcomes.
In order to reinforce the analyses from small-scale educational experiments like ours, we need to find a way to either eliminate or account for the classroom-based dependence structures.
David Diez, OpenIntro
The percentile bootstrap approach has made inroads to introductory statistics courses, sometimes with the incorrect declaration that it can be used without checking any conditions. Unfortunately, the percentile bootstrap performs worse than methods based on the t-distribution for small samples of numerical data. I would wager that the large majority of statisticians proselytize the opposite to be true, and I think this misplaced faith has created a small epidemic.
The percentile bootstrap is nothing new, but its weaknesses remain largely unknown in the community. I find myself wrestling with several considerations whenever I think about this topic.
Kari Lock Morgan, Assistant Professor of Statistics, Penn State University
Computers (or miniature versions such as smart phones) are necessary to do simulation-based inference. How then can we assess knowledge and understanding of these methods without computers? Never fear, this can be done! I personally choose to give exams without technology, despite teaching in a computer classroom once a week, largely to avoid the headache of proctoring a large class with internet access. Here are some general tips I’ve found helpful for assessing SBI without technology:
Much of the understanding to be assessed is NOT specific to SBI. In any given example, calculating a p-value or interval is but one small part of a larger context that often includes scope of inference, defining parameter(s), stating hypotheses, interpreting plot(s) of the data, calculating the statistic, interpreting the p-value or interval in context, and making relevant conclusions. The assessment of this content can be largely independent of whether SBI is used.
In lieu of technology, give pictures of randomization and bootstrap distributions. Eyeballing an interval or p-value from a picture of a bootstrap or randomization distribution can be difficult for students, difficult to grade, and an irrelevant skill to assess. Here are several alternative approaches to get from a picture and observed statistic to a p-value or interval without technology:
Choose examples with obviously small or not small p-values.
Robin Lock, Burry Professor of Statistics, St. Lawrence University
I have the luxury of teaching in a computer classroom with 28 workstations that are embedded in desks with glass tops to show the monitor below the work surface. This setup has several advantages (in addition to enforcing max class size cap of 28) since computing is readily available to use at any point in class, yet I can easily see all of the students, they can see me (no peeking around monitors), and they still have a nice big flat surface to spread out notes, handouts and, occasionally a text book (although many students now use an e-version of the text). I also have software on the instructor’s station (Smart Sync) that shows a thumbnail view of what’s on all student screens. Since the class is setup to use technology whenever needed and appropriate, it is natural to extend this to quizzes and exams, so my students routinely expect to use software as part of those activities.
Ideally I’d like to see what each student produces on the screen and how they interpret the output to make statistical conclusions, but it’s not practical to look over everyone’s shoulder as they work.
Jo Hardin – Pomona College, Claremont, CA
Many of us will agree that using tactile demonstrations is super fun and can also be an excellent way to teach a particular concept. Students engage with the material differently when they can touch, smell, or taste the objects as opposed to only seeing or listening to a demonstration. The SBI blog has had many excellent articles describing in-class tactile simulations, see here and here and here.
However, sometimes the logistical constraints setting up the demonstration take away too much from an already packed 50 minute class session. And those details get even harder with large classes. One of the biggest challenges comes from collecting data or getting results back from the students. Although some classes have sophisticated clickers that make data collection easier, setting up and using clickers is also a logistical challenge (well worth it for using all semester, but not for a one day class demonstration).
The conversation that ensues about the experimental design is incredibly valuable for understanding paired design (and the motivation for the pairing) or survival analysis (and the need for tools to analyze censored data).
When I am attempting to test understanding of carrying out a simulation test about a single proportion, I like to use the following problem, or some variation of it. I’m fond of animals and studies that show that animals are clever, so this study and ones like it, appeals to me.
A chimpanzee named Sarah was the subject in a study of whether chimpanzees can solve problems. Sarah was shown 30-second videos of a human actor struggling with one of several problems (for example, not able to reach bananas hanging from the ceiling). Then Sarah was shown two photographs, one that depicted a solution to the problem (like stepping onto a box) and one that did not match that scenario. Researchers watched Sarah select one of the photos, and they kept track of whether Sarah chose the correct photo depicting a solution to the problem. Sarah chose the correct photo in 7 of 8 scenarios that she was presented. In order to judge whether Sarah understands how to solve problems we will define π to be the probability Sarah will pick the photo of the correct solution.
I don’t let them get away with just claiming that the p-value is some particular number – they have to explain how they know it is that number.
One of the biggest challenges we all face as teachers of statistics is testing students’ statistical knowledge. For example, how do we know if the assessment questions we write are assessing students’ understanding of the statistical material and not some irrelevant construct? How do we know how many questions should be on an assessment to truly see if students are “getting it?” But these questions are only the tip of the iceberg. I know we also grapple with finding interesting contexts and datasets for assessing a particular statistical method; It is hard and time consuming! I have often found an exciting example and dataset only to find that once I start digging into the data, it is beyond the level expected of my introductory statistics students (*Sigh…back to the drawing board). When I was asked to write this blog post, I thought it would be great to share an interesting question so that the task of assessment development isn’t so burdensome for others.
I think it is important to note that I am not just asking them to conduct a randomization test, but am also asking them for interpretations and to think about how study design affects our conclusions. Asking them a multitude of questions for one context also saves me time for coming up with different contexts and reduces cognitive load for my students.