By Leiyue Li (St. Lawrence University)
Information
I have successfully used this project in my Applied Regression Analysis courses for two semesters now, guiding undergraduate students- ranging from sophomores to seniors - as they investigate a 1,000 × 50 simulated dataset. Most students in this class are new to R. They are required to create a storyline for the data and carry out a full analysis, including data cleaning, graphical and numerical summaries, model building, and inference. Students can work alone or in pairs. If working in pairs, they submit one five-page report jointly; individual participants submit their own five-page report. Students also give an eight-minute presentation as part of their final exam.
I successfully implemented this project in a class of 14 students. They noted that they had never worked with such a large dataset before and that the experience significantly improved their data analysis skills. Encouraged by these outcomes, I adapted the project for my current group of 26 students, who have similarly shown improved competence in applied modeling. Guided by different selection criteria - such as AIC, BIC, and p-values - students learn that there is rarely a single ""true"" predictor set. A key takeaway from this project is that the goal is not to identify one “true” set of predictors, but to encourage students to think critically about how their research questions guide their modeling decisions.