Potential and limitations of machine meta-learning (ensemble) methods for predicting COVID-19 mortality in a large inhospital Brazilian dataset.
Bruno Barbosa Miranda de PaivaPolianna Delfino PereiraClaudio Moisés Valiense de AndradeVirginia Mara Reis GomesMaíra Viana Rego Souza E SilvaKarina Paula Medeiros Prado MartinsThaís Lorenna Souza SalesRafael Lima Rodrigues de CarvalhoMagda Carvalho PiresLucas Emanuel Ferreira RamosRafael Tavares SilvaAlessandra de Freitas Martins VieiraAline Gabrielle Sousa NunesAlzira de Oliveira JorgeAmanda de Oliveira MaurílioAna Luiza Bahia Alves ScottonCarla Thais Cândida Alves da SilvaChristiane Correa Rodrigues CiminiDaniela PonceElayne Crestani PereiraEuler Roberto Fernandes ManentiFernanda D Athayde RodriguesFernando AnschauFernando Antonio BotoniFrederico BartolazziGenna Maira Santos GrizendeHelena Carolina NoalHelena DuaniIsabela Moraes GomesJamille Hemétrio Salles Martins CostaJúlia di Sabatino Santos GuimarãesJulia Teixeira TupinambásJuliana Rodrigues Machado RúgoloJoanna d'Arc Lyra BatistaJoice Coutinho de AlvarengaJosé Miguel ChatkinKaren Brasil RuschelLiege Barella ZandonáLílian Santos PinheiroLuanna da Silva Monteiro MenezesLucas Moyses Carvalho de OliveiraLuciane KopittkeLuisa Argolo AssisLuiza Margoto MarquesMagda César RaposoMaiara Anschau FlorianiMaria Aparecida Camargos BicalhoMatheus Carvalho Alves NogueiraNeimy Ramos de OliveiraPatricia Klarmann ZiegelmannPedro Gibson ParaisoPetrônio José de Lima MartelliRoberta SengerRochele Mosmann MenezesSaionara Cristina FranciscoSilvia Ferreira AraújoTatiana KurtzTatiani Oliveira FereguettiThainara Conceição de OliveiraYara Cristina Neves Marques Barbosa RibeiroYuri Carlotto RamiresMaria Clara Pontello Barbosa LimaMarcelo CarneiroAdriana Falangola Benjamin BezerraAlexandre Vargas SchwarzboldAndré Soares de Moura CostaBárbara Lopes FaraceDaniel Vitório SilveiraEvelin Paola de Almeida CenciFernanda Barbosa LucasFernando Graça AranhaGisele Alsina Nader BastosGiovanna Grunewald ViettaGuilherme Fagundes NascimentoHeloisa Reniers ViannaHenrique Cerqueira GuimarãesJúlia Drumond Parreiras de MoraisLeila Beltrami MoreiraLeonardo Seixas de OliveiraLucas de Deus SousaLuciano de Souza VianaMáderson Alvares de Souza CabralMaria Angélica Pires FerreiraMariana Frizzo de GodoyMeire Pereira de FigueiredoMilton Henriques Guimarães-JúniorMônica Aparecida de Paula de SordiNatália da Cunha Severino SampaioPedro Ledic AssafRaquel LutkmeierReginaldo Aparecido ValacioRenan Goulart FingerRufino de Freitas SilvaSilvana Mangeon Mereilles GuimarãesTalita Fischer OliveiraThulio Henrique Oliveira DinizMarcos André GonçalvesMilena Soriano MarcolinoPublished in: Scientific reports (2023)
The majority of early prediction scores and methods to predict COVID-19 mortality are bound by methodological flaws and technological limitations (e.g., the use of a single prediction model). Our aim is to provide a thorough comparative study that tackles those methodological issues, considering multiple techniques to build mortality prediction models, including modern machine learning (neural) algorithms and traditional statistical techniques, as well as meta-learning (ensemble) approaches. This study used a dataset from a multicenter cohort of 10,897 adult Brazilian COVID-19 patients, admitted from March/2020 to November/2021, including patients [median age 60 (interquartile range 48-71), 46% women]. We also proposed new original population-based meta-features that have not been devised in the literature. Stacking has shown to achieve the best results reported in the literature for the death prediction task, improving over previous state-of-the-art by more than 46% in Recall for predicting death, with AUROC 0.826 and MacroF1 of 65.4%. The newly proposed meta-features were highly discriminative of death, but fell short in producing large improvements in final prediction performance, demonstrating that we are possibly on the limits of the prediction capabilities that can be achieved with the current set of ML techniques and (meta-)features. Finally, we investigated how the trained models perform on different hospitals, showing that there are indeed large differences in classifier performance between different hospitals, further making the case that errors are produced by factors that cannot be modeled with the current predictors.
Keyphrases
- coronavirus disease
- machine learning
- sars cov
- systematic review
- healthcare
- cardiovascular events
- end stage renal disease
- deep learning
- risk factors
- chronic kidney disease
- cardiovascular disease
- adipose tissue
- type diabetes
- pregnant women
- prognostic factors
- risk assessment
- patient reported outcomes
- big data
- coronary artery disease
- respiratory syndrome coronavirus
- polycystic ovary syndrome
- convolutional neural network
- cross sectional
- double blind
- mass spectrometry
- drug induced
- patient reported