Overcoming Fears (my own) Teaching Reproducible Research, Big Data and Data Mining

Melinda Higgins (Emory)


To capitalize on the explosion of health data, big data computing platforms and data mining are critical for nursing and public health scientists. To address these needs, in spring 2017 we implemented our first course on "Big Data Analytics for Healthcare" (with our second cohort in spring 2018). This presentation will cover lessons learned from both instructor and student perspectives. Statistical modeling and data mining were taught with R and RStudio with Git version control and Github. The tidyverse R packages and programming workflow were taught and emphasized. Reproducible research principles and workflow (using rmarkdown and knitr R packages) were stressed. I expected more technical issues and student fears which were unfounded, exceeding both my and the students' expectations. Final student projects were challenging and well executed. Several student exemplars included: microbiome data analysis; integration of microclimate sensors and macroclimate regional weather data; analysis of datasets obtained using web-scraping and textual data mining. Each of these student projects addressed one or more of the social, behavioral, economic, or environmental determinants of health.