An operational machine learning approach to predict mosquito abundance based on socioeconomic and landscape patterns
Context Socioeconomic and landscape factors influence mosquito abundance especially in urban areas. Few studies addressed how socioeconomic and landscape factors, especially at micro-scale for mosquito life history, determine mosquito abundance. Objectives We aim to predict mosquito abundance based on socioeconomic and/or landscape factors using machine learning framework. Additionally, we determine these factors’ response to mosquito abundance. Methods We identified 3985 adult mosquitoes (majority of which were Aedes mosquitoes) in 90 sampling sites from Charlotte, NC, USA in 2017. Seven socioeconomic and seven landscape factors were used to predict mosquito abundance. Three supervised learning models, k-nearest neighbor (kNN), artificial neural network (ANN), and support vector machine (SVM) were constructed, tuned, and evaluated using both continuous input factors and binary inputs. Random forest (RF) was used to assess individual input’s relative importance and response to mosquito abundance. Results We showed that landscape factors alone yielded equal or better predictability than socioeconomic factors. The inclusion of both types of factors further improved model accuracy using binary inputs. kNN also had robust performance regardless of inputs (accuracy?>?95% for binary and?>?99% for continuous input data). Landscape factors group had higher importance than socioeconomic group (54.4% vs. 45.6%). Landscape heterogeneity (measured by Shannon index) was the single most important input factor for mosquito abundance. Conclusions Landscape factors were the key for mosquito abundance. Machine learning models were powerful tools to handle complex datasets with multiple socioeconomic and landscape factors to accurately predict mosquito abundance.