Programming in DS 101: How does language impact student struggle in intro data science?


By Aimee Schwab-McCoy (zyBooks)


Information

Intro data science courses have seen significant growth at the university level during the past decade. Data science courses typically introduce concepts like data wrangling, data visualization, and modeling using a programming language, like Python or R. Although concepts can be taught effectively with either language, syntax and style conventions present different programming challenges between Python or R. We identified sets of matched coding activities from an interactive courseware platform in which students performed similar tasks using either Python or R. Each coding task was small and well-defined, like fitting a predictive model, creating a contingency table, or calculating cross-validation metrics. We recorded time spent on the activity and number of attempts to a correct answer for all students assigned each task. The number of students per task varied from 50 to over 1,000 students. After aggregating student results across over 100 institutions, we found significant differences in student struggle on several coding tasks. Neither language was "better" for teaching intro data science - each had a unique set of difficult coding tasks. This poster will share which coding tasks had greater student struggle in Python or R, and help instructors anticipate potential pain points in their own courses. 


register