Azka Javaid - Predicting Flights Delays using the H2O Machine Learning Platform in R

Abstract

This study aimed to predict departure delay from 2008 to 2016 against year, carrier, air time, distance, week and season through the H2O machine and deep learning platform in R. Three different models were setup. A simple logistic regression was used to predict departure delay over 30 minutes against year, carrier, air time, distance, week and season. A subsequent logistic regression model was built to study influence of weather, assessed by temperature, dewpoint, humidity, wind direction, wind speed, wind gust, precipitation, pressure and visibility, season, month, week status, day, hour, distance and air time on departure delay over 30 minutes. A deep learning neural network was also built to study influence of variables like year, month, carrier, distance, hour, week, weekend and season on departure delay over 90 minutes. Grid, random search and checkpoint variations of the original models were developed to facilitate hyperparameter specification.

Overall, carrier type, season, distance and air time were most important at predicting departure delay occurrence over 30 minutes using logistic regression. Weather was not important in predicting departure delay occurrence over 30 minutes. Hours (5, 6 and 9 am) and carriers (Hawaiian Airlines, Northwest, Skywest and US Airlines) were most important at predicting departure delay over 90 minutes in the deep learning model.