Skip to main content

The Lineup Protocol: Using Simulation to Improve Model Diagnostics in Binary Logistic Regression

What does a good diagnostic plot look like? When we look at a residual or QQ-plot, we are implicitly comparing it to what we view as a good plot in our mind. It turns out that we as humans are exceptional at picking patterns out of randomness and hence tend to over-interpret visual diagnostics. We attempt to remedy this problem by simulating good diagnostic plots for a viewer to use as references when they evaluate a graphical display.

Exploring Missingness and its Implications in Traffic Stop Data

As traffic stop data has become increasingly available, so has scholarship for analyzing the data for evidence of discriminatory policing. However, relatively few studies address missingness (NA values) in the data despite all the data being conditional on recording. This project develops a framework for studying missingness through the stop missingness rate (SMR) and presents exploratory data analysis of SMR on data from the Stanford Open Policing Project.

Resilient Adaptation in Maltreated Children: Identifying Opportunities for Recourse Following Maltreatment

Each year within the United States, nearly 700,000 children experience maltreatment. In physical, emotional, social, academic, and economic terms, the cost of child maltreatment is debilitating to its survivors. The study of resilience, or the ability to achieve positive outcomes despite severe adversity, serves as the key to mitigating child maltreatment’s devastation of individuals and its reproduction within communities.

On the Generative Process of Solar Flares: Non-Poisson Behavior

The number of solar flares occurring in the corona is strongly correlated with the phase of the solar cycle. It is common practice to describe the yearly flare count distributions with a Poisson distribution. We find that the observed distributions are overdispersed relative to that expected from Poisson, and thus conclude that a Poisson generative model is not appropriate to fit to flare data aggregated in that manner.

Earthquake Analysis

Earthquakes pose serious risks to infrastructure, economic viability, and human lives across the world. In significant occurrences, buildings could be knocked over, homes could be destroyed, families may end up displaced with their valuables ruined, and hundreds to thousands of lives could be lost. As a result of the unpredictable nature of earthquakes, foretelling their impacts have proven to be difficult, especially pertaining to the death toll in each case.

Bayesian Analysis of Quality of Life and PPE (Personal Protective Equipment) Use During the Coronavirus (COVID-19) Pandemic

Concerned with the documented psychological effects of the COVID-19 pandemic, this study examines factors associated with quality of life and PPE use during the pandemic. After controlling for demographics, we created a Bayesian multiple regression model to examine the associated factors of quality of life; we also utilized Bayesian LASSO techniques to create a model predicting PPE use.

Statistical Analysis of Glacier Change

As climate change continues to ravage the planet's ecosystem, being able to predict glacier loss while understanding its uncertainty has become a necessity. However, the standard practices to determine this uncertainty are not computationally feasible for large datasets. Our research focused on improving the accuracy of a method proposed by Rolstad et al (2009) that more efficiently approximates the uncertainty of glacier melting. We artificially created hundreds of glaciers of varying shapes, sizes, and terrains and assessed the accuracy of this approximation.

Who is the NBA GOAT (Greatest of All Time)? Using Culturally Relevant Data to Teach ANOVA

This paper provides introductory statistics instructors the capacity to use culturally relevant data within a web application to either strengthen students' understanding or introduce the concept of variance and One-Way ANOVA. Using culturally relevant data within the classroom provides context to data that students deem important to their lives. This paper not only provides a lesson plan for teaching these concepts but also provide a web application and the culturally relevant data set if the instructor decides to use the app or the data in another context.

Detecting Acute Ischemic Stroke from ECG Data - A Topological Approach

Acute ischemic stroke is one of the major causes of adult disability and mortality. Various studies have shown that patients with such stroke oftentimes experience autonomic imbalance which is reflected by a decreased heart rate variability. Hence, early detection of the stroke is made viable through the analysis of electrocardiogram (ECG) data. However, standard heart rate variability parameters are prone to human error, and they must be analyzed together with other physiological metrics such as respiratory rate.