Kari Lock Morgan, Assistant Professor of Statistics, Penn State University
Computers (or miniature versions such as smart phones) are necessary to do simulation-based inference. How then can we assess knowledge and understanding of these methods without computers? Never fear, this can be done! I personally choose to give exams without technology, despite teaching in a computer classroom once a week, largely to avoid the headache of proctoring a large class with internet access. Here are some general tips I’ve found helpful for assessing SBI without technology:
Much of the understanding to be assessed is NOT specific to SBI. In any given example, calculating a p-value or interval is but one small part of a larger context that often includes scope of inference, defining parameter(s), stating hypotheses, interpreting plot(s) of the data, calculating the statistic, interpreting the p-value or interval in context, and making relevant conclusions. The assessment of this content can be largely independent of whether SBI is used.
In lieu of technology, give pictures of randomization and bootstrap distributions. Eyeballing an interval or p-value from a picture of a bootstrap or randomization distribution can be difficult for students, difficult to grade, and an irrelevant skill to assess. Here are several alternative approaches to get from a picture and observed statistic to a p-value or interval without technology:
Choose examples with obviously small or not small p-values.
Robin Lock, Burry Professor of Statistics, St. Lawrence University
I have the luxury of teaching in a computer classroom with 28 workstations that are embedded in desks with glass tops to show the monitor below the work surface. This setup has several advantages (in addition to enforcing max class size cap of 28) since computing is readily available to use at any point in class, yet I can easily see all of the students, they can see me (no peeking around monitors), and they still have a nice big flat surface to spread out notes, handouts and, occasionally a text book (although many students now use an e-version of the text). I also have software on the instructor’s station (Smart Sync) that shows a thumbnail view of what’s on all student screens. Since the class is setup to use technology whenever needed and appropriate, it is natural to extend this to quizzes and exams, so my students routinely expect to use software as part of those activities.
Ideally I’d like to see what each student produces on the screen and how they interpret the output to make statistical conclusions, but it’s not practical to look over everyone’s shoulder as they work.
When I am attempting to test understanding of carrying out a simulation test about a single proportion, I like to use the following problem, or some variation of it. I’m fond of animals and studies that show that animals are clever, so this study and ones like it, appeals to me.
A chimpanzee named Sarah was the subject in a study of whether chimpanzees can solve problems. Sarah was shown 30-second videos of a human actor struggling with one of several problems (for example, not able to reach bananas hanging from the ceiling). Then Sarah was shown two photographs, one that depicted a solution to the problem (like stepping onto a box) and one that did not match that scenario. Researchers watched Sarah select one of the photos, and they kept track of whether Sarah chose the correct photo depicting a solution to the problem. Sarah chose the correct photo in 7 of 8 scenarios that she was presented. In order to judge whether Sarah understands how to solve problems we will define π to be the probability Sarah will pick the photo of the correct solution.
I don’t let them get away with just claiming that the p-value is some particular number – they have to explain how they know it is that number.
One of the biggest challenges we all face as teachers of statistics is testing students’ statistical knowledge. For example, how do we know if the assessment questions we write are assessing students’ understanding of the statistical material and not some irrelevant construct? How do we know how many questions should be on an assessment to truly see if students are “getting it?” But these questions are only the tip of the iceberg. I know we also grapple with finding interesting contexts and datasets for assessing a particular statistical method; It is hard and time consuming! I have often found an exciting example and dataset only to find that once I start digging into the data, it is beyond the level expected of my introductory statistics students (*Sigh…back to the drawing board). When I was asked to write this blog post, I thought it would be great to share an interesting question so that the task of assessment development isn’t so burdensome for others.
I think it is important to note that I am not just asking them to conduct a randomization test, but am also asking them for interpretations and to think about how study design affects our conclusions. Asking them a multitude of questions for one context also saves me time for coming up with different contexts and reduces cognitive load for my students.