USPROC | The Undergraduate Statistics Project Competition

Statistical Analysis of Factors Impacting Hotel Review Scores in the Las Vegas Strip

Due to the well known nature of Las Vegas as a large hub for gambling and travel, a study was conducted to look at the factors which would best help to predict TripAdvisor ratings for hotels on the Las Vegas Strip. A high hotel rating would likely translate to higher foot traffic in the corresponding hotel and thus a higher income. The data used described stays in the hotels in the year 2015.

Analysis of Age, Race, and Income as Factors of Work Time

The purpose of our analysis is to examine the relationship that age and race, split by income group, have on the number of hours per day spent on work and related activities. Using Annual American Time Use Survey data from the IPUMS database, spanning the years 2015 to 2020, we fit two multiple regression models with work hours per day as the response. Our models suggest that for all income levels, work hours do not decrease as the age of the working increases. Additionally, the race of the worker does not change the relationship between age and work hours for all income levels.

Electronic Health Records - The New Future of Healthcare?

Electronic health resources are playing an increasingly important role in health care. However, the generational gap in technology usage presents a barrier for older populations to access online health records, which could affect their health outcomes. Using data on electronic personal health record (PHR) usage from the Cleveland Health Clinic, we fit a linear regression model with interactions to assess whether there is an association between PHR usage and diabetic outcome (quantified by HbA1c%), and whether this relationship depended on the user’s age.

Uncovering the Relationship between Online News Characteristics and Popularity

Given online news’ tremendous popularity and its potential societal impacts, this study seeks to understand which characteristics within the text of a news article corresponds to a higher number of shares. Using a data set obtained from the UC Irvine Machine Learning Repository that contains information on 39,644 articles published on Mashable, we utilized 58 predictive attributes to determine which had the greatest impact on the variable of interest.

Spatial Modeling of Bird Population Using Citizen Science Data

Observation count data from eBird can be used to model the relative abundance of bird species. We found that such data is generally overdispersed compared to a Poisson distribution and that a quasi-Poisson generalized additive model is appropriate for the data. Expanding on previous research for eBird data, we incorporated spatial dependence into the modeling task by performing hierarchical generalized additive modeling with a spatial conditional autoregressive structure for random effects.

Addressing Inequality Through Modeling: Updating Public Defense Funding Models in Washington State

The Washington State Office of Public Defense (OPD) has identified multiple shortcomings in their county funding distribution methodology including: missing variables, disequitable funding allocation, difficulties in interpretability, and arbitrary or unfounded model coefficients. We consider and evaluate a least absolute shrinkage and selection operator (LASSO) model to attempt to address the considerations raised by the OPD. We, further, use unsupervised Principal Component Analysis (PCA) to reduce dimensionality and support our model creation.

Estimated effects of Short-term Controlled Ozone Exposure on DNA Methylation of Genes Related to Lung Functions

As energy use and production rises globally, air quality has become a concerning issue as about 545 million people globally suffer from chronic respiratory disease, corresponding to a 39.8% increase since 1990. As a result, estimating the effects of air pollutants on lung functions is crucial for improving public health.

Predicting Key Factors in NFL Contract Extensions

While modern NFL players are often paid more than most college students make in their lives, over 16% of NFL rookies end up filing for bankruptcy by the age of 36. Rookie NFL players receive an initial contract that typically covers their first four to five years of playing professionally in the NFL. To continue playing after this, fourth-year players are eligible to negotiate a contract extension or to enter into free agency.

Investigation of the Ability of Normality Tests to Prevent Issues in Downstream Tests

The validity of many parametric statistical procedures depends on the normality assumption which is often checked using tests of normality. Researchers have studied the type I error rate and power of the standard normality tests for different alternative hypotheses to suggest the most powerful normality tests under different situations.

Assessing Efficacy of Different Probabilistic Softwares with a Bayesian Hierarchal Model

Information on vegetation distributions is a key factor in establishing a baseline for ecological health as well as influencing environmental regulations and policy, but collecting data on these distributions can be costly and difficult. We have developed a Bayesian hierarchical model for three deciduous tree species in order to predict and classify sites according to the dominant vegetation cover with relation to wildfire-driven forest conversion in the Jemez Mountains of New Mexico.

Subscribe to