Albert Y. Kim (Amherst/Smith College)
Tuesday, June 12, 2018 - 2:00pm
FiveThirtyEight.com is a data journalism website founded by Nate Silver that makes many of the datasets used for their articles openly available on GitHub.com. The fivethirtyeight R package acts as an intermediary to make all this data, its documentation, and links to the original articles easily accessible to R users. Furthermore, the package "tames" the data: the data is pre-processed enough so that the biggest barriers to data exploration faced by novice R users are eliminated, but not so much that the true nature of the data as it exists "in the wild" is completely betrayed. In this webinar, I will present the corresponding set of "tame" data principles, discuss the pedagogical thinking behind them, and present illustrative examples involving datasets from articles on FiveThirtyEight.com.
Tuesday, May 8, 2018 - 2:00pm
The learnr R package provides a new multimedia approach for teaching statistics and programming with R. Building on R Markdown, this package allows teachers to create interactive tutorials containing narrative, figures, illustrations, and equations, code exercises (R code chunks that users can edit and execute directly), multiple choice quiz questions, videos, and interactive Shiny components. Tutorials built with this tool can be used for checking and reinforcing students' understanding and have the benefit of being self-paced and provide instant feedback. In this webinar we will demonstrate how to use the learnr package to build interactive R tutorials and discuss best practices for using them.
Todd Schwartz and Jane Monaco (University of North Carolina)
Tuesday, April 10, 2018 - 2:00pm
Online courses and 'flipped' classrooms are becoming more commonly found in statistics/biostatistics. A gap exists in the literature in regard to a systematic study of instructors' of these types of (bio)statistics courses. We conducted a survey to elicit these instructor's responses in terms of implementation, ratings, recommendations, and opinions, and we report on n=46 such instructors. In this webinar, we describe characteristics of these respondents' courses, as well as summarizing their responses on various aspects. Results are given both overall, as well as for different subgroups of interest. Our findings should be useful to inform statistics educators who might be considering adopting these formats.
Matt Hayat, Michael Jiroutek, MyoungJin Kim, and Todd Schwartz
Tuesday, March 27, 2018 - 2:00pm
Healthcare professionals and faculty depend on the health and medical literature to keep current with clinical information and best evidence-based practices. Yet, little is known about their knowledge of, and comfort level with, statistics. We conducted a research study on health sciences faculty to assess their knowledge about statistics. A probability sample of schools of dentistry, nursing, medicine, pharmacy, and public health were selected, and faculty were invited to complete a brief online survey that included 9 demographic-related questions and a 10-question statistics knowledge instrument. In this webinar we will present study results, including aggregated findings for the 708 respondents, as well as interesting discipline-specific findings. Implications for statistics educators will be discussed, and time will be allotted for questions from the audience.
Dennis Sun (Cal Poly and Google)
Tuesday, February 13, 2018 - 2:00pm
Over the last few years, there has been a consensus that data science students should be involved in all stages of the data analysis process, from data preparation and wrangling, to presentation and visualization. But data science courses have varied widely in their implementation. Some courses go into great depth about statistical models and machine learning, while others focus on tools like XML, SQL, and web scraping. While there is no question that a budding data scientist must acquire these skills eventually, what should be covered in a course on data science? I suggest that data science courses be organized around three core concepts: paradigms for representing data, paradigms for manipulating data, and paradigms for visualization. These are topics of genuine intellectual merit that are underrepresented elsewhere in the statistics and computer science curriculum. The tools are secondary, and I suggest how such a course could be taught using R examples using the tidyverse or using Python examples.
Thursday, January 11, 2018 - 2:00pm
Simulation is an effective tool for analyzing probability models as well as for facilitating understanding of concepts in probability and statistics. Unfortunately, implementing a simulation from scratch often requires users to think about programming issues that are not relevant to the simulation itself. We have developed a Python package called Symbulate (https://github.com/dlsun/symbulate) which provides a user friendly framework for conducting simulations involving probability models. The syntax of Symbulate reflects the "language of probability" and makes it intuitive to specify, run, analyze and visualize the results of a simulation. Moreover, Symbulate's consistency with the mathematics of probability reinforces understanding of probabilistic concepts. This webinar will demonstrate Symbulate's use with a variety of probability concepts and problems, including: probability spaces; events; discrete and continuous random variables; joint, conditional, and marginal distributions; stochastic processes; and more.
Rochelle Tractenberg (Georgetown University)
Tuesday, January 9, 2018 - 2:00pm
Since data analysis is becoming important across disciplines, the ASA Ethical Guidelines for Statistical Practice, which were updated in 2016, can serve to introduce all students in quantitative disciplines to critical concepts of responsible data analysis, interpretation, and reporting. The Guidelines contain elements that are suitable, and important, components of training for undergraduates whether or not they are statistics majors, to prepare them for ethical quantitative work. The Guideline principles interact, and sometimes must be prioritized. Therefore, neither the simple distribution of –nor an encouragement to memorize- the Guidelines can promote the necessary level of awareness. This presentation will introduce ethical reasoning as a learnable, improvable skill set that can provide an entry point to working with the 2016 revised ASA Ethical Guidelines.
Nicholas Horton, Amherst College
Tuesday, November 21, 2017 - 2:00pm
In this webinar, I will describe a classroom activity where pairs of students hand scrape data from cars.com, ingest these data into R, then carry out analyses of the relationships between price, mileage, and model year for a selected type of car. This early in the semester activity can help illustrate the statistical problem solving process. The "Less Volume, More Creativity" approach utilized by the mosaic package facilitates the analysis with a minimal amount of syntax. Key concepts that are introduced and reinforced including data ingestion, multivariate thinking through graphical visualizations, and regression modeling. Extensions and additional use of the dataset will be discussed along with potential pitfalls. Project Files:
Jeff Witmer (Oberlin College)
Tuesday, October 17, 2017 - 2:00pm
Regression to the mean, also known as "the regression effect," is an important but sometimes overlooked topic in introductory statistics. We will discuss the regression effect and how to teach it. We will also consider a number of examples of the "regression fallacy," in which people who are ignorant of the regression effect make up ad hoc (and sometimes very misleading) explanations for what they see in data.
Carolee Mitchell, Academic Relationships Manager, data.world
Tuesday, July 25, 2017 - 2:00pm
18M+ open datasets exist today, and growth is accelerating. But these data sets live in data portals without common taxonomies or architectures, and must first be cleaned and prepared by data users. Human and computers normalize, extract meaning, and identify correlations, but this work is siloed: used for one project, then lost forever, only to be repeated from scratch by the next person to touch the data.
Open data can help us rise to humanity’s toughest challenges, but only if we maximize its network effect. To build the web of Linked Data, we have to start by connecting the people who are working with data.