2009 Undergraduate Statistics Project Competition

First Place

Gerald Haun,
Adrienne Gallo

Psychology, University of the Sciences in Philadelphia
Instructor Sponsor:
Ralph M. Turner

Title: Hierarchical Linear Modeling of the Effects of Self-Reflection Strategies on Mood

A 3-day experimental study (N = 78) examined the initial trajectory of mood increases resulting from completion of 1 of 2 positive psychology based self-regulation strategies: imagining one's best possible self (BPS) and expressing gratitude (EG). Undergraduate participants were randomly assigned to BPS, EG, or a control condition, detailing morning routine groups. Participants completed their respective exercises for 3 consecutive days. Mood was measured at baseline and following completion of the exercise each day using the Authentic Happiness Index (AHI). The BPS and EG strategies improved participants moods in a similar response trajectory, suggesting that they begin to work immediately and progress in a quadratic trajectory in the short-term. Results parallel previous evaluations which have supported the use of self-regulation techniques as a mechanism of increasing positive affect.


Second Place
Chee Lee
Statistics, St. Olaf College

Instructor Sponsor:
Paul Roback

Title: Random forest to predict a complete operon map of the Mycobacterium tuberculosis genome

Characterization of genes in the Mycobacterium tuberculosis (MTB) genome is an essential starting point to understanding the biological pathways and processes of the bacteria, especially those processes that contribute to the bacteria's virulence.  Here I use the statistical method random forest to build models that predict operon pairs (OPs) and non-operon pairs (NOPs) on the MTB genome using intergenic distance, coexpression correlations from microarray experiments, and promoter and terminator data.  OPs are genes in operons, which are sets of regulated genes that are turned on under certain conditions and are involved in essential biological processes of the bacteria.  MTB has 3,999 gene pairs, 1,439 of which have been confirmed as OP or NOP.  The remaining unclassified 2,560 gene pairs are potential operon pairs (POPs).  Three random forests models were built, two of which incorporated values imputed for missing coexpression correlations, to classify POPs and construct a complete operon map of the MTB genome.  The three forests overlapped in over 90% of POP predictions.  Sensitivity rates of the models ranged from 86.5-87.2%, and specificity rates were around 90%, which compares well with previously published classification models.


Third Place
Kinjal Basu
Sujay Saha
Statistics, Indian Statistical Institute, Kolkata, India
Instructor Sponsor:
Saurabh Ghosh

Title: Estimation of Allele Frequencies from Quantitative Trait Data

Analyses of quantitative traits, as opposed to binary clinical end-points, are becoming increasingly popular in genetic studies of human disorders. This stems from the fact that quantitative traits carry more information on variability within different genetic profiles than binary end-points. However, estimation of allele frequencies from quantitative trait data is statistically more challenging as the genotype information are unavailable and hence needs to inferred probabilistically.  Moreover, the choice of the probability distribution for the underlying quantitative trait poses robustness issues on the estimates of the allele frequencies. In this article, we discuss three estimation procedures: the first is based on cluster analysis while the other two are based on EM and CEM algorithms, respectively using a three component mixture of Gaussian distributions corresponding to the three genotypes at a bi-allelic locus controlling the quantitative trait. Some modifications are also suggested for handling deviations from Gaussian model assumptions, especially for asymmetric and heavy-tailed distributions. Some simulated data sets and one real life data set are analyzed to show the utility of proposed methods.


Forth Place
Victor Louie
Maykel Vosoughiazad
Statistics, UCLA
Instructor Sponsor:
Nicholas Christou, Dave Zes

Title: Recent Trends in Methane: A Spacio-Temporal Analysis

Methane, one of the more significant greenhouse gases, does not get as much attention as other greenhouse gases such as carbon dioxide, especially since the rate at which methane is being released into the atmosphere has leveled off in recent years.  However, it has been found that methane is bubbling up from undersea chimneys and that could greatly increase its concentration in the atmosphere, due to the melting of the permafrost layers.  Should the concentration of methane spiral out of control, it would greatly exacerbate global warming, thereby causing a positive feedback loop in which the melting of the permafrost layer releases more methane into the atmosphere, which causes the Earth to warm and more of the permafrost layer to melt, and thereby causing more methane to be released into the atmosphere.   The rate at which methane is being released into the atmosphere has risen back up in 2007.  Is this a cause for concern?  In this paper, we intend to answer this question by providing some descriptive statistics, in which we examine the trend of methane through time, spacial prediction through kriging, and inference using hypothesis testing, to investigate the significance of the recent increase in the rate at which methane is being released into the atmosphere.   For these statistical analyses, our data included methane recordings from 23 locations across the globe, from the years 1983 to 2007, ice core data, and more recent data which includes measurements from 2008.  Our findings show that the increase in the rate at which methane is being released into the atmosphere for the years 2006 to 2007 and 2007 to 2008 are significantly higher than the average increase for the years 1770 to 1990, the years where methane increase was most dramatic.


Honorable Mention
Authors Ashley Peterson, Kate Forsythe
Title The Association between Sleep and College GPA
Institution Mathematics and Statistics, St. Olaf College
Instructor Sponsor James Scott

Title: The Association between Sleep and College GPA

College completion is known to be strongly associated with GPA.  Furthermore, psychological maladjustment occurs most in students sleeping less than six hours per night and has been known to affect academic performance.  This study examined the association between the amount of sleep a college student has per night and the student's cumulative GPA.  Other explanatory variables besides sleep, such as gender, current credit load, cumulative credits earned, time spent talking on a cell phone, on the internet, communicating with parents, studying, and exercising were looked at as potential confounders affecting GPA.  The dataset was obtained from the StoVault database and contained twenty-nine variables, eighteen of which were used.  The data was self-reported and collected by St. Olaf College professor Sharon Lane-Getaz via surveys given to her statistics students during her years of teaching.  Participants were from California Polytechnic State University, the University of Minnesota, Macalester College, and St. Olaf College.  The statistical computer program, STATA 9, was used to produce visual representations of the data, as well as determine correlation coefficients, perform t-tests, chi-square tests, multiple linear regressions, and logistic regressions.  The t-tests, multiple linear regression, and the logistic regression identified log cell phone use and studying to be significantly associated (on the 0.05 significance level) with GPA.  However, the results of all statistical tests used in this study indicate that a statistically significant association between sleep and GPA does not exist.  However, it would be ideal to perform the investigation once more, employing the same variables, but using a randomized sample of college students that may be more representative of the college student population.


Honorable Mention
Authors John Eastling, Kyle Johnsrud
Institution Mathematics and Computer Science , Gustavus Adolphus College
Instructor Sponsor Carolyn Pillers Dobler

Title: Was your vote purchased? Statistical relationships between votes received and campaign contributions

While your vote may not have been literally purchased, as the title suggests, campaign contributions can play a very significant role in election outcomes.  In this project we examined the votes received and campaign contributions of the 2008 U.S. Congressional Elections of the Senate and the House of Representatives.  Focusing primarily on the House, we analyzed the whole population of elections, excluding non-majority party candidates and unopposed candidates.  One of our most significant findings was that close elections have a strong tendency to be more expensive.  We also confirmed that in the 2008 elections, incumbents raised more money and received more votes than did non-incumbents, and both incumbent and non-incumbent Democrats received more votes than their Republican counterparts.  We developed strong models for predicting the proportion of vote received based on party, incumbency and campaign contributions.  While these models could not necessarily be generalized, as Democrats have not received more money and votes over time, the principles could be applied to other election cycles. However, any models that incorporated campaign contributions and incumbency could likely be generalized.


Honorable Mention
Authors Scott Powers
Institution Statistics and Operations Research, University of North Carolina at Chapel Hill
Instructor Sponsor Peter Mucha

Title: Stealing wins: A study of clutch basestealing

An argument has risen between baseball statisticians and baseball traditionalists. The stolen base has long been a favorite of baseball enthusiasts because they are exciting and seem to provide huge momentum swings in the game. Statisticians contend that stolen bases actually have little impact on the game because the risk of getting caught all but outweighs the benefit of successfully stealing. The statisticians have a point � before 2001, the league on the whole was costing itself runs on the basepaths. It turns out that baserunners need to be successful a high percentage of the time in order to contribute runs.

But even when stolen bases cost teams runs, they can still give the team wins. In the 1980s, when teams were running wild, teams lost runs but gained victories with speed. While run expectancy added by stolen bases can be predicted very accurately based on stolen bases and caught stealing, win expectancy is not so easily explained. The question is: Is the part of win expectancy added which is not explained by stolen base success rate better explained by random fluctuations or by clutch basestealing tendencies?

Examining the residuals, it is clear that players do not put up consistently higher win expectancies from year to year and that the fluctuations in win expectancy added is better explained by random fluctuations in how often the players have the opportunity to steal bases in high-impact situations. The conclusion is that clutch basestealing does not exist and that the statisticians are justified in downplaying the importance of stolen bases. Maybe a stolen base will win the battle, but they won't win the war.


Honorable Mention
Authors Erin Milne, Nelson Winkler
Institution Lyman Briggs College, Michigan State University
Instructor Sponsor Aklilu Zeleke

TitleSeeing Better: Optimizing Surgically Induced Astigmatism
Correction Factors for Cataract Surgery

Cataract surgery is one of the most common operations performed in the United States
each year[1]. Cataract surgery is seen as a "routine" procedure; however, it is not without complication. The surgery changes the natural astigmatism of the eye, which can cause blurry vision. It is possible to correct for this surgically induced astigmatism (SIA) during the operation. Currently, a standard correction factor of 0.5 diopters (D) is used. While this correction factor produces respectable results, we endeavored to improve upon the model and deliver individualized SIA predictions. The goal of our project was to predict the SIA for use in surgery on a second eye, based upon prior surgical results from the same patient. Pre-operative and post-operative cataract surgery data was gathered from a private ophthalmology practice. SIA values were then calculated, and a two-sample t-test was done to compare the mean values for left and right eyes. When no significant difference was found, we performed regression analysis to determine a model for SIA prediction. We analyzed our models using residual plots, a chi-squared test for goodness of fit, and generated a graphical comparison between our model and the standard correction factor of 0.5 D. We found that our model is on par with the currently accepted standard correction factor and is, in fact, more accurate in cases of significant difference between the models. We believe we have generated a useful tool that can be easily and successfully utilized in patient care.



USRPOC Competition Committee and Judges (*)
Cooray, Kahadawala Central Michigan University
Curtiss, Phyllis Grand Valley State University
Daniels, John Central Michigan University
Famoye, Felix* Central Michigan University
Hooks, Tisha Winona State University
Holcomb, John* Cleveland State University
Hong, Soon Grand Valley State University
Mentele, James Central Michigan University Research Corporation
Kaplan, Jennifer Michigan State University
Lee, Carl* Central Michigan University
Malone, Christopher* Winona State University
Rey, Tim Dow Chemical Company
Shoultz, Gerald Grand Valley State University
Tintle, Nathan* Hope College
Witmer, Jeff* Oberlin College