Cultivating Creativity in Data Work

Traditionally, statistical training has focused primarily on mathematical derivations, proofs of statistical tests, and the general correctness of what methods to use for certain applications. However, this is only one dimension of the practice of doing analysis. Other dimensions include the technical mastery of a language and tooling system, and most importantly the construction of a convincing narrative tailored to a specific audience, with the ultimate goal of them accepting the analysis.

Helping the Red Cross Predict Flooding in Togo

This project aims to help the Red Cross predict flooding occurrences in Togo due to overflow in the Nangbeto Dam. Flooding is a result of both high flow rate and the water level in the dam at any point in time. This project focuses specifically on predicting the flow rate in the dam using precipitation data from eight locations around the country. A Lasso model and cross validation were employed to evaluate the significance of the predictors and capture the variance of flow rate.

Understanding the 2016 Presidential Election: An analysis of how economic and race/immigration politics influenced swing voters

The 2016 US Presidential Election was unprecedented, as traditional prediction methods failed to forecast the outcome. Using a demographic and political opinion survey of confirmed voters, we characterized President Trump’s voters, particularly “non-Republicans,” with two hypotheses: 1) Trump voters were economically downtrodden and 2) voters aligned with Trump’s immigration and race rhetoric.

Bag of Little Random Forests (BLRF)

Random Forests are an ensemble method that utilizes a number of decision trees to make robust predictions in both regression and classification settings. However, the process of bootstrap aggregation, the mechanism underlying the Random Forests algorithm, requires each decision tree to physically store and perform computations on data sets of the same size as the training data set. This situation is oftentimes impractical given the large size of data sets nowadays.

Spatial Cross-Validation

Cross-validation is a popular computational method for model assessment and selection. With spatial data, however, many of the independence assumptions behind cross-validation break down. This talk will motivate and introduce some spatial cross-validation methods proposed in the literature to address these issues. We will then explore the results of a simple simulation study comparing the performances of nonspatial and spatial cross-validation methods on simulated spatial data.

The County Development Index: Quantifying Human Development in a United States Context

In 1990, the United Nations created the Human Development Index (HDI) as a metric to holistically measure country development based on more than economic status. Education, health, and monetary variables are used to provide a multidimensional understanding of a nation’s development. While the HDI describes development between countries, m understanding the variation in development within a country requires more detail. The County Development Index (CDI) serves to quantifiably report human development by county in the United States by contextualizing the HDI framework.