Noninvasive Diagnostic for COVID-19 from Saliva Biofluid via FTIR Spectroscopy and Multivariate Analysis.
Márcia H C NascimentoWena D MarcariniGabriely S FolliWalter G da Silva FilhoLeonardo L BarbosaEllisson Henrique de PauloPaula F VassalloJosé G MillValério G BaraunaFrancis Luke MartinEustáquio V R de CastroWanderson RomãoPaulo Roberto FilgueirasPublished in: Analytical chemistry (2022)
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused the worst global health crisis in living memory. The reverse transcription polymerase chain reaction (RT-qPCR) is considered the gold standard diagnostic method, but it exhibits limitations in the face of enormous demands. We evaluated a mid-infrared (MIR) data set of 237 saliva samples obtained from symptomatic patients (138 COVID-19 infections diagnosed via RT-qPCR). MIR spectra were evaluated via unsupervised random forest (URF) and classification models. Linear discriminant analysis (LDA) was applied following the genetic algorithm (GA-LDA), successive projection algorithm (SPA-LDA), partial least squares (PLS-DA), and a combination of dimension reduction and variable selection methods by particle swarm optimization (PSO-PLS-DA). Additionally, a consensus class was used. URF models can identify structures even in highly complex data. Individual models performed well, but the consensus class improved the validation performance to 85% accuracy, 93% sensitivity, 83% specificity, and a Matthew's correlation coefficient value of 0.69, with information at different spectral regions. Therefore, through this unsupervised and supervised framework methodology, it is possible to better highlight the spectral regions associated with positive samples, including lipid (∼1700 cm -1 ), protein (∼1400 cm -1 ), and nucleic acid (∼1200-950 cm -1 ) regions. This methodology presents an important tool for a fast, noninvasive diagnostic technique, reducing costs and allowing for risk reduction strategies.
Keyphrases
- sars cov
- respiratory syndrome coronavirus
- machine learning
- coronavirus disease
- global health
- deep learning
- cell proliferation
- big data
- long non coding rna
- public health
- nucleic acid
- end stage renal disease
- chronic kidney disease
- optical coherence tomography
- artificial intelligence
- ejection fraction
- high resolution
- newly diagnosed
- prognostic factors
- data analysis
- electronic health record
- peritoneal dialysis
- working memory
- transcription factor
- genome wide
- gene expression
- neural network
- dna methylation
- molecular dynamics
- structural basis