Data scraping, ingestation, and modeling: bringing data from into the intro stats class

Tuesday, November 21st, 20172:00 pm – 3:00 pm

Presented by: Nicholas Horton, Amherst College


In this webinar, I will describe a classroom activity where pairs of students hand scrape data from, ingest these data into R, then carry out analyses of the relationships between price, mileage, and model year for a selected type of car. This early in the semester activity can help illustrate the statistical problem solving process. The "Less Volume, More Creativity" approach utilized by the mosaic package facilitates the analysis with a minimal amount of syntax. Key concepts that are introduced and reinforced including data ingestion, multivariate thinking through graphical visualizations, and regression modeling. Extensions and additional use of the dataset will be discussed along with potential pitfalls.

 Project Files: