The dataset for the ASA 2017 Datafest competition was provided by Expedia Inc., a travel company that primarily runs travel fare aggregator websites. The dataset includes over 10 million user records of searches and purchases through various Expedia websites. This paper conducts a machine learning analysis via a classification decision tree to identify potential customers who do not purchase a travel package but are similar to those who do. The paper then narrows down on the countries a group of potential customers is most likely to travel to as well as the types of hotels.
In the environmental sciences, portions of collected data are often reported as non-detect, meaning that the actual data point is known only to be below the detection limit of a measuring device. In mainstream statistics, this type of data is known as left-censored data. Oftentimes, data sets include two detection limits for various reasons.
Graph representations are used across disciplines for the analysis and visualization of relational data. Exponential random graph models allow for a general method of modelling the underlying stochastic process that has generated the observed data conditional on observer attributes of the vertices, or nodes. Recent developments in ERGMs have introduced the notions of local dependence and the exponential random network model, or ERNM.
Risk assessment algorithms are increasingly common in the criminal justice process to predict the chance that an individual convicted of a crime will commit another crime, or recidivate. Recent studies have sparked interest in verifying that such assessment tools predict the risk of recidivism with equal accuracy across races.
Learning to Rank (LTR) is the application of machine learning to rank search results according to their degree of relevance to the query. Salesforce Enterprise Search employs a hand-crafted ranking function to score search results and order them accordingly for users. Data about this ranking process are stored in JSON format, which is a nested tree with arbitrary depth. We present our first effort to parse this data, and extract the inputs of the ranking function into a tabular format.
Women in tech often don’t talk about their own achievements due to fear of being labeled “bossy” or “vain.” In fact, they are far more likely than men to credit mentors rather than themselves with their success. But, how do women talk about themselves when asked by other influential women? We can start to answer this question using tweets from a trending hashtag the week after the release of the “anti-diversity” Google memo that encouraged women in tech to “brag.” Using topic modeling and word associations, we can see that women are actually great at bragging about themselves.
Safe drinking water is a right that should be guaranteed to all populations. In the United States, we know that many urban areas have the ability to obtain safe drinking water, but can rural communities similarly do so? If there is limited access, can technologies, such as point-of-use devices, temporarily improve water quality? With this in mind, we designed an experiment to test water quality of locations of close vicinity around a Midwestern liberal arts institution. Two variables of interest were location of water supply and filtration on how they affect drinking water quality.
Community-driven online question and answer forums (CQA) are becoming increasingly valuable sources of information. These platforms house an expansive amount of crowd- sourced knowledge in the form of thousands of questions and answers posted everyday. There are forums that cover a broad range of topics, like Yahoo! Answers, and forums focused on specific topics, like computer programming-focused Stack Overflow. An example of the latter is iFixit’s Answers forum.
In cancer studies, hormesis is a phenomenon where low doses of a carcinogen reduces the risk of cancer while high doses increase the risk. There are several models to test for hormesis, however some are not flexible enough to detect hormesis. Our research objective was to compare five individual models as well as the method of model averaging (MA).
Each hole on a golf course is assigned a handicap that has both a technical meaning and a perception of difficulty. Handicap is based in large part on average scoring difference between strong and poor golfers, but it is not necessarily the same as “difficulty”. Handicapping is important to the game of golf, as it affects certain formats of tournament play. Hole difficulty is also important to players as they plan their shots. As a summer research project supported by the Northern Kentucky University UR-STEM program, we investigated factors that play into golf courses’ handicap rating for e