P2-18: Using Real Data in the Classroom: Public Use Data Files from the National Center for Health Statistics

By Michael R. Jiroutek (Campbell University); Matthew J. Hayat (Georgia State University); MyoungJin Kim (Illinois State University); Todd A. Schwartz (The University of North Carolina at Chapel Hill)


The National Center for Health Statistics (NCHS), in conjunction with the Centers for Disease Control (CDC) and Prevention, conducts nationally representative general population surveys across a variety of health topics. These data provide a multidimensional platform for statistics educators to introduce real world studies that utilizes innovative and complex sampling techniques to obtain a generalizable probability sample requiring advanced statistical methods for analysis.  These cleaned and de-identified datasets are freely available on the NCHS and CDC websites, often with applicable syntax/code also available for downloading (in SAS, SPSS, and Stata formats). Consistent with GAISE, incorporating such real-world data into classroom instruction can foster student enthusiasm and interest and allow for demonstration of advanced statistical concepts and methods such as weighting, clustering, and stratification. We include examples of innovative uses of these data to emphasize the importance of carefully accounting for sampling technique in the statistical analysis.

