Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index.
Paula Andrea Rosero PerezJuan Sebastián Realpe GonzalezRicardo Salazar-CabreraDavid RestrepoDiego M LopezBernd BlobelPublished in: Journal of personalized medicine (2023)
In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.
Keyphrases
- coronavirus disease
- sars cov
- public health
- machine learning
- risk factors
- climate change
- respiratory syndrome coronavirus
- big data
- systematic review
- randomized controlled trial
- healthcare
- electronic health record
- type diabetes
- metabolic syndrome
- deep learning
- adipose tissue
- mass spectrometry
- high resolution
- insulin resistance
- social media
- risk assessment
- children with cerebral palsy