Tuesday, September 9th, 20254:00 pm – 4:45 pm ET
Presented by: Presented by: Laura DeLuca (Carnegie Mellon University), Alex Reinhart (Carnegie Mellon University), Gordon Weinberg (Carnegie Mellon University), Michael Laudenbach (New Jersey Institute of Technology), and David Brown (Carnegie Mellon University)
Abstract
In this September edition of the JSDSE/CAUSE webinar series, we highlight the recent article Developing Students’ Statistical Expertise Through Writing in the Age of AI. As large language models (LLMs) such as GPT have become more accessible, concerns about their potential effects on students’ learning have grown. In data science education, the specter of students’ turning to LLMs raises multiple issues, as writing is a means not just of conveying information but of developing their statistical reasoning. In their work, the authors engage with questions surrounding LLMs and their pedagogical impact by: (a) quantitatively and qualitatively describing how select LLMs write report introductions and complete data analysis reports; and (b) comparing patterns in texts authored by LLMs to those authored by students and by published researchers. Their results show distinct differences between machine-generated and human-generated writing, as well as between novice and expert writing. Those differences are evident in how writers manage information, modulate confidence, signal importance, and report statistics. The findings can help inform classroom instruction, whether that instruction is aimed at dissuading the use of LLMs or at guiding their use as a productivity tool. It also has implications for students’ development as statistical thinkers and writers. What happens when they offload the work of data science to a model that doesn’t write quite like a data scientist?