By Matt Slifko (The Pennsylvania State University)
Information
As more schools offer introductory data science courses, there is an increased demand for real datasets that may be used for developing data wrangling, data exploration, and statistical thinking skills. We present a collection of datasets related to the popular Call of Duty® video game series and share our experiences incorporating these datasets into small university-level courses (7-40 students) ranging from introductory R to introductory statistical learning. We provide a detailed data dictionary and sample instructions to help students and faculty who are unfamiliar with the subject matter. We also present a variety of examples for building data and statistical thinking skills. For example, students are asked to determine whether the player’s team won a match through string processing. The importance of multivariate thinking is illustrated by asking students to create data visualizations that show how the experience points earned depend on factors beyond performance metrics. Insights gained from data visualizations are extended to modeling concepts such as the inclusion of categorical data in regression models. For a course on statistical learning, students are asked to apply cross validation techniques to compare the performance of various machine learning algorithms for predicting a player’s performance.