Soma Roy – Cal Poly, San Luis Obispo
I firmly believe that the key to getting students to appreciate what the discipline of Statistics does is to show them examples – lots of examples of a variety of real studies that investigate real research questions, and have students analyze the data from such studies. And, so in all of my classes I use data from real research studies to help students understand that Statistics is about things that matter, and that it has applications to the real world, which they tend to think of as separate from their statistics class, especially when it is a General Education class. Below I have listed a few strategies I use to give students experience with real data from genuine studies, a few resources where you can find such data and studies, and have also included a few examples of studies I use in class.[pullquote]… I often have to go through many articles before I find something that fits the objective(s) I have in mind. On the plus side, I often find articles that though not suitable for the topic I have in mind at the time, does have other things to offer.[/pullquote]
Strategy 1: Always being on the lookout
Many of the studies I use in my classes came about because I read something in the news or heard something on the radio, and then looked for related articles on the Internet.
Example: This morning (March 31, 2015) when I was browsing the Health section under www.yahoo.com I saw the headline “The SodaCancer Connection” – which led me to this article (Loftfield et al., 2015) from the Journal of National Cancer Institute. The study discusses a prospective cohort study of 447,357 nonHispanic whites followed for a median of 10.5 years, and in those followup years 2904 cases of malignant melanoma were detected. Even though the researchers performed analyses that are much more sophisticated than we expect our STAT 101 students to do, we can still use the summary data provided in the article to have STAT 101 students do simple analyses. For example, using the information in Table 1 and Figure 1 of the article, I came up with the following twoway table that cross classifies the participants by whether or not they developed melanoma and how much coffee they drank per day.
Coffee Intake 

None 
≤ 1 cup/day  23 cups/day  4+ cups/day 
Total 

Developed malignant melanoma 
310 
942  1253  399 
2904 
Did not develop malignant melanoma 
44264 
139901  186767  73521 
444453 
Total 
44574 
140843  188020  73920 
447357 
This example can be used to explore several different ideas:
 Conditional proportions and segmented bar charts
 Inference methods for comparing several groups on a categorical response (simulationbased and theorybased methods)
 Scope of conclusion – to whom are the study results applicable; can causeandeffect conclusions be drawn?
 Using odds ratios/relative risk for doseresponse modeling
Social media can be a useful source, too. I often find interesting posts on Facebook that turn into examples to be used in class.
Strategy 2: Patiently searching on the Internet
The Internet can be a very useful tool when looking for data – as long as you know the right words to type into the search engine. I am happy to say that with time and practice I have become quite good at finding exactly the kind of studies I want, when teaching certain topics. For example, when I search for “effect of exercise on blood pressure” on Google Scholar many journal articles turn up, and one among them is “Effects of Regular Exercise on Blood Pressure and Left Ventricular Hypertrophy in AfricanAmerican Men with Severe Hypertension” from The New England Journal of Medicine, 1995. The article provides means and SDs of a few variables (systolic BP, diastolic BP, etc.) for the two treatment groups, as well as the sample sizes, and these statistics can be used to carry out statistical inference procedures.
If, like me, you want to run simulationbased inference on the data, then the summary statistics are not as helpful as the raw data would be. I have found that most authors of recent research studies are happy to share the raw data with you if you email them.
Strategy 2 does require lots of patience, and I often have to go through many articles before I find something that fits the objective(s) I have in mind. On the plus side, I often find articles that though not suitable for the topic I have in mind at the time, does have other things to offer. I archive/save these articles for future use. I save them in folders such as “Multiple groups, categorical response” or “Two groups, quantitative response,” etc.
Other online sources of data:
 Gallup
 Pew Research Center
 Bureau of Labor Statistics
 National Center for Health Statistics
 Centers for Disease Control and Prevention
 National Climatic Data Center
 Pew Research Center
 Journal of Statistics Education, Datasets and Stories
 Science Daily
Strategy 3: Asking colleagues
I am very fortunate to have many great colleagues in my department with whom I get to talk about teaching statistics all the time, and I find them to be great resources when it comes to finding real data and studies to use in class. I understand that not all statistics instructors are as lucky as I am, but that’s why we have online communities and websites that serve statistics educators: the isostat listserv, ASA’s Section on Statistical Education listserv, and of course, the sbi listserv to name a few.
[pullquote]The benefit of using data collected on [students] is that it makes the investigation that much more relevant to them.[/pullquote]
Strategy 4: Collecting data on students
In all my classes students frequently investigate research questions by using data that they have collected on themselves. For example, they collect and analyze data to investigate whether people can remember more information when the information is presented in smaller recognizable chunks rather than larger unrecognizable chunks. My hope is that this study’s results will provide students helpful hints when it comes to studying habits. Students also collect data on whether heart rates are different after having been sitting versus jumping for 30 seconds. The benefit of using data collected on them is that it makes the investigation that much more relevant to them.
I often conduct online surveys in my classes where students are asked questions such as: How many hours of sleep did you get last night? Do you eat breakfast? How far from home did you have to travel to go to school here? Do you consider yourself an early bird or a night owl? How many hours per week outside of class do you plan on spending studying for your statistics course? Do you live on campus or off campus? Then as we cover various data types and inference methods, students analyze these data. For example, at my school the expectation is that students will spend at least 8 hours per week outside of class on a 4unit course. Students use their class data to test whether on average the students in their class are planning on spending less than the recommended amount. Students also investigate whether those students who live on campus tend to get more sleep than those who live off campus?
At the end of the quarter this gives me the opportunity to ask students to identify something they learned in the class not related to statistics or some example/study they found memorable as a measure of whether I was successful in engaging them in some of these contexts.
In one of my introductory statistics classes, students collect data daily on the amount of time they spend on different activities, such as, in class, preparing/studying for class, working out, hanging out with friends, etc. At the of the quarter they analyze the data to see whether and how the time spent of various activities changed over the duration of the term. My hope is that this will help students realize that time management is skill to be worked on, and that their data will guide them towards honing this skill.
Strategy 5: Student projects
To provide students with additional opportunities to holistically practice methods of data collection, analysis, and reporting, I include a project component in all my classes. Students work in teams to first come up with research questions, and data collection plans. Then, they collect and analyze the data, and then write a report summarizing the research and the findings. This gives students a chance to apply statistics to real world data that matters to them, and to practice their written communication skills, especially technical writing. As part of their presentation, students have to convince me why their research question matters, and also do a brief literature review of other similar studies. One of my favorite student projects (Bacon, Boggan, Burton, and Stamer, 2012) is one that investigated whether men with children tend to live longer than men without children, and I use that dataset in class now.[pullquote] [I] ask students to identify something they learned in the class not related to statistics or some example/study they found memorable as a measure of whether I was successful in engaging them in some of these contexts.[/pullquote]
In some of my upper level statistics courses, I have projects that are solely about reviewing articles from peerreviewed journals; for these projects students have to find studies that answer scientific questions using specific statistical methods, and then write up about the methods and materials, as well the findings.
Note: If you want to read more about how to incorporate student projects in your courses, we have several posts on this page.
If you have strategies that you use to find interesting datasets and real research studies to use in class, I would love to hear about them! Please share your ideas by posting comments!
I found a nice example of a paper utilizing Fisher’s Exact Test that I like for a few reasons:
Wolkenstein et al. (1998). Randomised comparison of thalidomide versus placebo in toxic epidermal necrolysis. The Lancet 352, 15861589.
1) It’s a good illustration of a case where it’s difficult to collect data and thus a small sample size occurs. Specifically, it’s a rare disease, then you have to get patient consent even once you find those with this condition.
2) It’s a case of using Fisher’s Exact Test in a real scientific study.
3) The study had to be stopped because more people were dying from the active treatment than placebo (10/12 vs. 3/10). Thus, it’s a nice opportunity to talk about ethics when studying human subjects.
Enjoy!
Megan