Machine Learning-Based Nicotine Addiction Prediction Models for Youth E-Cigarette and Waterpipe (Hookah) Users.
Jeeyae ChoiHee-Tae JungAnastasiya FerrellSeoyoon WooLinda HaddadPublished in: Journal of clinical medicine (2021)
Despite the harmful effect on health, e-cigarette and hookah smoking in youth in the U.S. has increased. Developing tailored e-cigarette and hookah cessation programs for youth is imperative. The aim of this study was to identify predictor variables such as social, mental, and environmental determinants that cause nicotine addiction in youth e-cigarette or hookah users and build nicotine addiction prediction models using machine learning algorithms. A total of 6511 participants were identified as ever having used e-cigarettes or hookah from the National Youth Tobacco Survey (2019) datasets. Prediction models were built by Random Forest with ReliefF and Least Absolute Shrinkage and Selection Operator (LASSO). ReliefF identified important predictor variables, and the Davies-Bouldin clustering evaluation index selected the optimal number of predictors for Random Forest. A total of 193 predictor variables were included in the final analysis. Performance of prediction models was measured by Root Mean Square Error (RMSE) and Confusion Matrix. The results suggested high performance of prediction. Identified predictor variables were aligned with previous research. The noble predictors found, such as 'witnessed e-cigarette use in their household' and 'perception of their tobacco use', could be used in public awareness or targeted e-cigarette and hookah youth education and for policymakers.