Kari Lock Morgan- Penn State University
Simulation-based methods, with minimal background knowledge required, give you the option of introducing inference on the first day of class. This is fantastic, and enabling the course to start inference earlier is one of the big advantages of simulation methods, but it’s also completely fine to NOT jump into inference on the first day of class if you don’t want to!
I believe the key to getting students excited about statistics is to focus on REAL DATA
The majority of my students in the introductory statistics course are there not because of an inherent interest in statistics, but to satisfy a requirement (sound familiar?). Because of this, my primary goal on the first day of class is to convince them that statistics is (a) interesting, (b) applicable to almost everything, and (c) important to learn. (As statisticians, we all know how incredibly true each of these points is; we just need to convey the obvious to our students!) Secondary goals include an overview of the syllabus (boring), and some data basics such as the structure of a dataset, cases and variables, and categorical vs quantitative variables. However, if my students learn nothing concrete on the first day, but leave actually somewhat excited to learn statistics, then I would consider the class a success.
I believe the key to getting students excited about statistics is to focus on REAL DATA: relevant datasets, diverse datasets, and interesting questions that can be answered by data.
I begin by telling my students “Data are everywhere! Regardless of your field, interests, lifestyle, etc., you will almost definitely have to make decisions based on data, or evaluate decisions someone else has made based on data.” I try to make it crystal clear on the first day that statistics is NOT death-by-plugging-numbers-into-formulas, but rather developing skills to answer fascinating questions based on data.
Focusing on data and interesting questions that statistics can help us answer is really the whole point of the first class (the way I teach it), so if you want to stop reading here, you’ve already gotten the main message. If you want some more specific ideas, keep reading!
To convince them that data are everywhere and diverse, I have a browser ready with 5-10 tabs open, each tab showing a different source of data online. I choose datasets from a variety of different areas (university rankings, finance, sports, public opinion polls, medical data, unemployment data, etc.). I also try to make it more personal and applicable to students by asking each student to think of a potential dataset (it doesn’t have to exist) that they would personally be interested in analyzing, and we spend some time having student volunteers share their ideas (if you’re lucky, you’ll get some fun answers, and it helps sets a good tone for the class!). For each of these exercises, the online data and the hypothetical datasets, students are identifying cases and variables, serving the dual purpose of exposing them to data while also getting them comfortable with the structure of data.
In addition to raw datasets, we also focus on interesting questions that can be answered by data. For example, I show them the following set of questions:
- Can eating a yogurt a day cause you to lose weight?
- Do males find females more attractive if they wear red?
- Does louder music cause people to drink more beer?
- Are lions more likely to attack after a full moon?
I ask them to identify the variable(s), and classify them as categorical or quantitative (that’s maybe not so exciting), but then I tell them the answer to all these questions is yes, and it starts to give them a flavor for the power and breadth of statistics.
I also use clickers to collect data on the fly (if you haven’t tried clickers, I highly recommend them!). For example, I ask students to click in as to whether they are generally more romantically interested in someone who is obviously into them, or someone who plays hard to get. (College students are very interested in anything to do with dating!) They will then go on to ask for results by gender, and immediately they’ve learned the value of including more than one variable. They probably didn’t enter the class expecting statistics to help them with love!
Hopefully just realizing the wide reach of data will convince students that statistics is important to learn, but I also like to include an example that’s a little more thought provoking and challenging to also convince students that statistics is not just common sense. There are many examples you could choose, but I use a map showing the counties with the highest rates of kidney cancer death, ask students to brainstorm explanations, and then show a map of counties with the lowest rates (which looks very similar). Students can come up with all kinds of explanations, but the key is that both tend to be counties with very few people, because proportions are more variable for smaller sample sizes (which we talk about by imagining a very small county of size 1). This is a deep and important idea, and I try to draw them in by saying that this is the kind of idea that they will be fluent with by the end of the course – hopefully this intrigues them to want to learn more! We also talk about cases and variables in this context, and discuss the fact that cases and variables are dependent on each other, and that it isn’t always straightforward (if the variable is a categorical yes/no, then the cases are people, while if the variable is a rate of disease, then the cases are counties).
Lastly, I close the class by showing this video, because I think it gets across what I want to convey much better than I could.
NOTE: We then go on to spend the next week or so talking about data collection and scope of inference, and then a week or two on descriptive statistics. Only then, several weeks into the course, do we begin talking about simulation-based methods and inference. So, if you want to use simulation methods and haven’t done anything about this yet, never fear, there is still time!
 Example from Gelman, et. al. (2014). Bayesian Data Analysis, 3rd edition.