Skip to main content

The Procrastinator’s Guide to a Data Science Career

The best approach to a data science career involves discipline, organization, and patience. So what do you do if you have none of those traits? In this talk, I'll share strategies for entering a career in data science or statistics, based on my own experience working as a Data Scientist at Stack Overflow and my history as an inveterate procrastinator. With the right philosophy, procrastination can a surprisingly productive strategy that is especially well-suited to the modern field of data science.

A Business Opportunity/Targeting Expedia’s Niche Market in Travel Packages Via Analytical and Predictive Modeling

The dataset for the ASA 2017 Datafest competition was provided by Expedia Inc., a travel company that primarily runs travel fare aggregator websites. The dataset includes over 10 million user records of searches and purchases through various Expedia websites. This paper conducts a machine learning analysis via a classification decision tree to identify potential customers who do not purchase a travel package but are similar to those who do. The paper then narrows down on the countries a group of potential customers is most likely to travel to as well as the types of hotels.

Local Dependence in Exponential Random Network Models

Graph representations are used across disciplines for the analysis and visualization of relational data. Exponential random graph models allow for a general method of modelling the underlying stochastic process that has generated the observed data conditional on observer attributes of the vertices, or nodes. Recent developments in ERGMs have introduced the notions of local dependence and the exponential random network model, or ERNM.

Extracting structured data for Relevance Ranking

Learning to Rank (LTR) is the application of machine learning to rank search results according to their degree of relevance to the query. Salesforce Enterprise Search employs a hand-crafted ranking function to score search results and order them accordingly for users. Data about this ranking process are stored in JSON format, which is a nested tree with arbitrary depth. We present our first effort to parse this data, and extract the inputs of the ranking function into a tabular format.

How Women in Tech Talk About Themselves on Twitter

Women in tech often don’t talk about their own achievements due to fear of being labeled “bossy” or “vain.” In fact, they are far more likely than men to credit mentors rather than themselves with their success. But, how do women talk about themselves when asked by other influential women? We can start to answer this question using tweets from a trending hashtag the week after the release of the “anti-diversity” Google memo that encouraged women in tech to “brag.” Using topic modeling and word associations, we can see that women are actually great at bragging about themselves.

pHew! That's a Relief: Assessing Drinking Water Quality and Treatment of Water Supply

Safe drinking water is a right that should be guaranteed to all populations. In the United States, we know that many urban areas have the ability to obtain safe drinking water, but can rural communities similarly do so? If there is limited access, can technologies, such as point-of-use devices, temporarily improve water quality? With this in mind, we designed an experiment to test water quality of locations of close vicinity around a Midwestern liberal arts institution. Two variables of interest were location of water supply and filtration on how they affect drinking water quality.