• What should we teach in data science courses?

    Dennis Sun (Cal Poly and Google)
    Tuesday, February 13, 2018 - 2:00pm ET
    Over the last few years, there has been a consensus that data science students should be involved in all stages of the data analysis process, from data preparation and wrangling, to presentation and visualization. But data science courses have varied widely in their implementation. Some courses go into great depth about statistical models and machine learning, while others focus on tools like XML, SQL, and web scraping. While there is no question that a budding data scientist must acquire these skills eventually, what should be covered in a course on data science? I suggest that data science courses be organized around three core concepts: paradigms for representing data, paradigms for manipulating data, and paradigms for visualization. These are topics of genuine intellectual merit that are underrepresented elsewhere in the statistics and computer science curriculum. The tools are secondary, and I suggest how such a course could be taught using R examples using the tidyverse or using Python examples.
  • Symbulate: Simulation in the Language of Probability

    Kevin Ross, Cal Poly
    Thursday, January 11, 2018 - 2:00pm ET
    Simulation is an effective tool for analyzing probability models as well as for facilitating understanding of concepts in probability and statistics. Unfortunately, implementing a simulation from scratch often requires users to think about programming issues that are not relevant to the simulation itself. We have developed a Python package called Symbulate (https://github.com/dlsun/symbulate) which provides a user friendly framework for conducting simulations involving probability models. The syntax of Symbulate reflects the "language of probability" and makes it intuitive to specify, run, analyze and visualize the results of a simulation. Moreover, Symbulate's consistency with the mathematics of probability reinforces understanding of probabilistic concepts.  This webinar will demonstrate Symbulate's use with a variety of probability concepts and problems, including: probability spaces; events; discrete and continuous random variables; joint, conditional, and marginal distributions; stochastic processes; and more.
  • The Hows and Whys of Reasoning with the ASA Ethical Guidelines

    Rochelle Tractenberg (Georgetown University)
    Tuesday, January 9, 2018 - 2:00pm ET
    Since data analysis is becoming important across disciplines, the ASA Ethical Guidelines for Statistical Practice, which were updated in 2016, can serve to introduce all students in quantitative disciplines to critical concepts of responsible data analysis, interpretation, and reporting. The Guidelines contain elements that are suitable, and important, components of training for undergraduates whether or not they are statistics majors, to prepare them for ethical quantitative work. The Guideline principles interact, and sometimes must be prioritized. Therefore, neither the simple distribution of –nor an encouragement to memorize- the Guidelines can promote the necessary level of awareness. This presentation will introduce ethical reasoning as a learnable, improvable skill set that can provide an entry point to working with the 2016 revised ASA Ethical Guidelines.
  • Data scraping, ingestation, and modeling: bringing data from cars.com into the intro stats class

    Nicholas Horton, Amherst College
    Tuesday, November 21, 2017 - 2:00pm ET
    In this webinar, I will describe a classroom activity where pairs of students hand scrape data from cars.com, ingest these data into R, then carry out analyses of the relationships between price, mileage, and model year for a selected type of car. This early in the semester activity can help illustrate the statistical problem solving process. The "Less Volume, More Creativity" approach utilized by the mosaic package facilitates the analysis with a minimal amount of syntax. Key concepts that are introduced and reinforced including data ingestion, multivariate thinking through graphical visualizations, and regression modeling. Extensions and additional use of the dataset will be discussed along with potential pitfalls. Project Files: https://github.com/Amherst-Statistics/Cars-Scraping-Webinar
  • Regression to the Mean/The regression effect

    Jeff Witmer (Oberlin College)
    Tuesday, October 17, 2017 - 2:00pm ET
    Regression to the mean, also known as "the regression effect," is an important but sometimes overlooked topic in introductory statistics. We will discuss the regression effect and how to teach it. We will also consider a number of examples of the "regression fallacy," in which people who are ignorant of the regression effect make up ad hoc (and sometimes very misleading) explanations for what they see in data.
  • Maximizing Linked Data's Network Effect

    Carolee Mitchell, Academic Relationships Manager, data.world
    Tuesday, July 25, 2017 - 2:00pm ET
    18M+ open datasets exist today, and growth is accelerating. But these data sets live in data portals without common taxonomies or architectures, and must first be cleaned and prepared by data users. Human and computers normalize, extract meaning, and identify correlations, but this work is siloed: used for one project, then lost forever, only to be repeated from scratch by the next person to touch the data. Open data can help us rise to humanity’s toughest challenges, but only if we maximize its network effect. To build the web of Linked Data, we have to start by connecting the people who are working with data. Visit: https://data.world/
  • Teaching the Past, Present, and Future of Statistics

    Nicholas J. Horton (Amherst College)
    Tuesday, June 20, 2017 - 2:00pm ET
    In 2014 Committee of Presidents of Statistical Societies (COPSS) published a book entitled "Past, Present, and Future of Statistical Science" that contains 52 short chapters contributed by past winners of one of the COPSS Awards. The goal of the book (which is freely downloadable from the COPSS website or http://tinyurl.com/copss-ppf) was to "showcase the breadth and vibrancy of statistics, to describe current challenges and new opportunities, to highlight the exciting future of statistical science, and to provide guidance for future generations of statisticians (page xvii)." In this webinar, I will describe how these chapters were integrated into a theoretical statistics course to help students see the big picture and potential for statistics.
  • Initial Findings about Graduate Teaching Assistants’ Training Needs to Foster Active Learning in Statistics

    Kristen Roland and Jennifer Kaplan (University of Georgia)
    Tuesday, April 18, 2017 - 2:00pm ET
    As enrollment in introductory statistics courses across the country rises, more instructors for these courses are needed. Many statistics courses are now taught by Graduate Teaching Assistants (GTAs). Little is known, however, about the training needs of GTAs to foster active learning and promote conceptual understanding, critical recommendations of the GAISE guidelines to improve undergraduate learning in statistics. This talk will discuss changes to our lab activities to incorporate GAISE recommendations of teaching for conceptual understanding, foster active learning, and integrating real data. We will also discuss initial findings concerning the struggles GTAs have with connecting their theoretical knowledge to conceptual ideas concerning confidence intervals for one population proportion. The material is based on work supported by NSF DUE 1504587.
  • A Fully Customizable Textbook for Introductory Statistics/Data Science Courses

    Chester Ismay and Albert Y. Kim
    Tuesday, March 14, 2017 - 2:00pm ET
    This webinar will provide a guide to creating a user-adaptable electronic textbook incorporating data visualization, data science, and other relevant pedagogical concepts into your introductory statistics course. We present our own introductory statistics and data science textbook available at http://moderndive.com that: Focuses on the entirety of the data/science pipeline from importing data to visualizing and summarizing data to inferential techniques and developing students as effective data storytellers Blurs the line between lecture and lab Uses freely available modern, rich, and complex data sources Leverages resampling and simulation to build statistical inference concepts Most importantly, provides complete customizability to the instructor and reproducibility to the student We’ll discuss how collaboration and crowd-sourcing have and will play a role in our textbook going forward and other open-source materials we are creating to better support introductory statistics/data science students learning the skills and tools that statisticians/data scientists are using today. For the complete powerpoint presentation of today's webinar: http://bit.ly/moderndive-causeweb
  • A Real Data Set for Business Forecasting & Data Mining Applications

    Concetta DePaolo, David Robinson, and Aimee Jacobs
    Tuesday, February 21, 2017 - 2:30pm ET
    We present actual data gathered from a café run by business students. We give examples of time series forecasting and data mining applications, and frame problems as managerial questions to emphasize data-driven decision making.