Analyzing Statistics Students' Writing Before and After the Emergence of Large Language Models


Sara Colando (Carnegie Mellon University), Erin Franke (Carnegie Mellon University)


Location: Memorial Union Great Hall

Abstract

 

Background. Large language models (LLMs) such as ChatGPT have become ubiquitous in academic settings. A study surveying 1001 undergraduates across institutions found the majority to use ChatGPT for general purposes and 33.1% to use ChatGPT for writing purposes monthly (Baek et al., 2024). In the realm of statistics and data science education, therefore, it is important to understand how the style of student statistics writing has changed since the introduction of ChatGPT. Prior research has found significant differences in style of writing between LLMs and humans, such as differences in present participial clauses, passive voice, and vocabulary (Reinhart et al., 2024). The ability to effectively communicate statistical methodology and results to non-experts is key for statisticians. For undergraduates learning this skill, the use of LLMs to write papers could seriously inhibit this developmental process. To better understand this phenomenon, we plan to compare writing style between a corpus of student writing to that of ChatGPT, as well as to a corpus from experts in statistics (collected from Significance and Chance). Our student writing corpus consists of data analysis reports from both an introductory and an advanced undergraduate statistics class at Carnegie Mellon University, before and after ChatGPT’s mainstream acceptance.

 

Methods. Following the work of Reinhart et al. (2024), we will use Biber features rates, which characterize various lexical, grammatical, and rhetorical features of a text, to assess the (dis)similarity between statistics reports from Carnegie Mellon students before ChatGPT’s widespread usage to more recent student reports. Additionally, we will compare the Biber feature rates within more recent student writing to ChatGPT writing (via the Human-AI Parallel English corpus) and the expert statistician writing corpus through ANOVA tests as well as ridgeline plots to visualize the differences in relevant Biber feature rates between groups. 


Implications For Teaching and For Research. As LLMs have become increasingly more common in academic settings, it is crucial to understand how they have changed how students learn to write about statistics and reason with real-world data. Our proposed research would offer a quantitative comparison between students’ statistics writing before ChatGPT’s introduction to more recent student statistics writing. Through presenting our work to statistics and data science educators, we hope to elucidate whether and how statistics writing has changed since the introduction of ChatGPT, and specifically, how these changes relate to writing trends that appear in the ChatGPT and expert statistics writing corpora.


register