Skip to main content

R Shiny Program for Statistics Education

Our research group used the Shiny package in R to develop the Book of Apps for Statistics Teaching (BOAST), which includes concept explorations, concept demonstration challenges, and game-based quizzing modules for both lower-division classes and upper-division classes under the statistics department. BOAST provides an active learning environment and dynamically illustrates concepts to help user understanding. In this video, we will discuss this programming and research experience and demonstrate the apps we developed.

Forestry Data Science: Reclassifying LANDFIRE's Existing Vegetation Type Variable

This project is centered on exploratory data analysis and the reclassification of data products collected by the USGS LANDFIRE program. Our research involved collapsing the EVT (Existing Vegetation Type) variable used by LANDFIRE into categories that maximized homogeneity with respect to four response variables: biomass, volume, basal area, and tree count. The k-means clustering algorithm was the primary method used to determine how to best collapse EVT with regard to both statistical patterning and ecological factors.

Quantitative Analysis of Polygenic Risk Score Prediction in the Genes for Good Cohort

A promising tool in genetic prognostics is the use of polygenic risk scores (PRS). PRS are the sum of an individuals’ disease-associated alleles weighted by estimated effect sizes from a genome wide association study (GWAS) for a disease or trait of interest. For some diseases, phenotypes can be predicted to a great degree of accuracy based on the PRS and additional risk factors (e.g. age, sex). In future precision medicine approaches, clinicians may use extreme PRS for certain diseases as an indication for medical intervention for patients.

From Reinforcement Learning to Portfolio Management

Reinforcement learning(RL) is a highly applied branch of machine learning. It contains many fundamental algorithms like Q-learning and SARSA, which can be used to deal with some "tabularizable"(i.e. easy) Markov Chain Decision Process problems, for example, a car running through a maze. It also maintains the potential of absorbing neural networks structures (e.g. Deep Q-Network), to solve some rather complicated problems like portfolio management in a certain market.

Cultivating Creativity in Data Work

Traditionally, statistical training has focused primarily on mathematical derivations, proofs of statistical tests, and the general correctness of what methods to use for certain applications. However, this is only one dimension of the practice of doing analysis. Other dimensions include the technical mastery of a language and tooling system, and most importantly the construction of a convincing narrative tailored to a specific audience, with the ultimate goal of them accepting the analysis.

Helping the Red Cross Predict Flooding in Togo

This project aims to help the Red Cross predict flooding occurrences in Togo due to overflow in the Nangbeto Dam. Flooding is a result of both high flow rate and the water level in the dam at any point in time. This project focuses specifically on predicting the flow rate in the dam using precipitation data from eight locations around the country. A Lasso model and cross validation were employed to evaluate the significance of the predictors and capture the variance of flow rate.

Understanding the 2016 Presidential Election: An analysis of how economic and race/immigration politics influenced swing voters

The 2016 US Presidential Election was unprecedented, as traditional prediction methods failed to forecast the outcome. Using a demographic and political opinion survey of confirmed voters, we characterized President Trump’s voters, particularly “non-Republicans,” with two hypotheses: 1) Trump voters were economically downtrodden and 2) voters aligned with Trump’s immigration and race rhetoric.

Bag of Little Random Forests (BLRF)

Random Forests are an ensemble method that utilizes a number of decision trees to make robust predictions in both regression and classification settings. However, the process of bootstrap aggregation, the mechanism underlying the Random Forests algorithm, requires each decision tree to physically store and perform computations on data sets of the same size as the training data set. This situation is oftentimes impractical given the large size of data sets nowadays.