Teaching Data Science, Statistical Thinking, and Collaboration using Team-Based Learning

Eric Vance (LISA-University of Colorado Boulder)


Statisticians and data scientists rarely "own" the data they analyze, nor do they typically originate the problems to be solved and decisions to be made with data. Therefore, to have real-world impact, statisticians and data scientists must collaborate with various domain experts to understand and refine the questions to be answered and access and analyze the appropriate data. Since collaboration is such a vital aspect of the work of most statisticians and data scientists, effective collaboration skills should be taught in statistics and data science curricula. But including such instruction in an already overflowing curriculum at the tertiary level can be challenging.

In a new introductory data science course for 17 undergraduates at the University of Colorado Boulder, students initially learn from a textbook and its exercises how to import, tidy, explore, visualize, summarize, and model data in R. These skills are reinforced and strengthened in class during individual and team application exercises that require students to program in R to answer questions about data sets that highlight fundamental concepts in statistical thinking such as confounding or regression to the mean. Students further develop their statistical thinking, programming, and professional skills by collaborating in permanent, semester-long teams of 4-5 students to analyze data using reproducible research methods for weekly lab assignments. The students then communicate the results of these analyses to answer relevant questions, make decisions, and provide recommendations for action.

In this session, we describe the many learning goals for this class and how structuring the course using the Team-Based Learning pedagogical strategies helps us achieve these goals. We believe that this approach will scale to larger classes because of the extensive within-team discussion and collaboration to learn and apply the course material. Such group learning can create a "small class" atmosphere even in large classes. We present the results from a pre- and post-course REALI assessment of statistical literacy and statistical reasoning and qualitative comments from students to gauge the effectiveness of this approach and its benefits. We discuss how to overcome potential barriers to implementing this approach, and suggest best practices for teaching data science, statistical thinking, and collaboration in the same course.