What should we teach in data science courses?

Tuesday, February 13th, 20182:00 pm – 3:00 pm

Presented by: Dennis Sun (Cal Poly and Google)


Over the last few years, there has been a consensus that data science students should be involved in all stages of the data analysis process, from data preparation and wrangling, to presentation and visualization. But data science courses have varied widely in their implementation. Some courses go into great depth about statistical models and machine learning, while others focus on tools like XML, SQL, and web scraping. While there is no question that a budding data scientist must acquire these skills eventually, what should be covered in a course on data science? I suggest that data science courses be organized around three core concepts: paradigms for representing data, paradigms for manipulating data, and paradigms for visualization. These are topics of genuine intellectual merit that are underrepresented elsewhere in the statistics and computer science curriculum. The tools are secondary, and I suggest how such a course could be taught using R examples using the tidyverse or using Python examples.