Optimizing Genetic Algorithm Parameters for Atmospheric Carbon Monoxide Modeling

Presented by:
Meera Duggal & William Daniels (Colorado School of Mines)

The primary source of atmospheric carbon monoxide (CO) in the Southern Hemisphere is large burn events. This makes a useful proxy for fires since CO is continuously measured by satellites. Fires, in turn, are influenced by the state of the atmosphere and oceans, which is captured in so-called climate indices. Therefore, predictive CO models can help countries prepare for unusually extreme fire seasons. We have developed a customized multiple linear regression model using climate indices and created the R package regClimateChem to perform variable selection for this atmospheric CO application. This package offers three different variable selection techniques: stepwise regression, a genetic algorithm, and an exhaustive search. The exhaustive search always finds the best possible model but is computationally expensive. Stepwise selection runs quickly and is scalable but often fails to find the best model. We implemented the genetic algorithm as a potential compromise between computational expense and model accuracy. The genetic algorithm is a stochastic variable selection technique and has many parameters that control the execution and stopping condition of the algorithm. Here we present an optimization study for these parameters, with the goal of reducing genetic algorithm runtime while preserving model accuracy. Beginning with four covariate models, we identified the optimal genetic algorithm configuration by varying five different parameter values individually. These optimal parameter values result in an 11.8% decrease in runtime saving with only a 0.3% decrease in model accuracy over the default values. We perform a similar study on five covariate models to see if our optimization results scale with the number of covariates. For the five covariate models, we vary each parameter individually and all pairwise combinations concurrently. This study is currently being conducted on the high-performance computing system at the National Center for Atmospheric Research. We present results from both studies and discuss our overall findings.