Ron Yurko (Carnegie Mellon University), Zach Branson (Carnegie Mellon University)
Abstract
Background. Data visualization is a core part of statistical practice and is ubiquitous in many fields. One of the ten goals in the GAISE College Report is that students “should be able to produce graphical displays and numerical summaries and interpret what graphs do and do not reveal.” Although there are numerous books on data visualization, instructors in statistics and data science may be unsure how to teach data visualization, because it is such a broad discipline. To give guidance on teaching data visualization from a statistical perspective, we make two contributions. First, we conduct a survey of data visualization courses at top 150 colleges and universities in the United States, in order to understand the landscape of data visualization courses. Second, we outline three teaching principles for incorporating statistical inference in data visualization courses, and provide several examples that demonstrate how instructors can follow these principles. The dataset from our survey allows others to explore the diversity of data visualization courses, and our teaching principles give guidance to instructors and departments who want to encourage statistical thinking via data visualization. In this way, statistics-related departments can provide a valuable perspective on data visualization that is unique to current course offerings.
Methods. We conduct a survey of data visualization courses at top 150 universities and colleges. We search online course catalogs (in Fall 2022 or Spring 2023) for keywords and then identify courses that are primarily focused on data visualization based on available course descriptions and materials. For each course, we gather several pieces of information including: academic level, department in which the course was taught, any listed software, and whether they mention ten different statistical topics to gauge the level of statistical content included in data visualization courses.
Findings. We find that data visualization is taught across many colleges and universities, but most of these courses are not taught by statistics and data science departments. Furthermore, most data visualization courses do not emphasize key aspects of statistical thinking (testing, uncertainty quantification, and modeling), and instead focus on topics unique to particular disciplines (e.g. storytelling, visual design or specialized software). We make our survey dataset publicly available (https://cmustatistics.github.io/data-repository/social/data-viz-survey.html) for instructors who want to glean further insights about data visualization courses, or use it as an example dataset in their own courses.
Implications For Teaching and For Research. Our survey suggests that key aspects of statistical thinking—inference and modeling—are not emphasized in many data visualization courses. Thus, we clarify how to teach data visualization in a way that encourages statistical thinking and highlights how inferential tools can enhance visualizations. In particular, we outline three statistical principles for teaching data visualization: (1) graphs should be complemented with inference, (2) graphs are estimates and statistics, and (3) graphs can be used to motivate, teach, and interpret statistical analyses. We illustrate these principles using examples from our undergraduate data visualization course at Carnegie Mellon University.