Reducing Quantitative Uncertainty Caused by Data Processing in Untargeted Metabolomics.
Zixuan ZhangHuaxu YuEthan Wong-MaPouneh DokouhakiAhmed A MostafaJay S ShavadiaFang WuTao HuanPublished in: Analytical chemistry (2024)
Processing liquid chromatography-mass spectrometry-based metabolomics data using computational programs often introduces additional quantitative uncertainty, termed computational variation in a previous work. This work develops a computational solution to automatically recognize metabolic features with computational variation in a metabolomics data set. This tool, AVIR (short for "Accurate eValuation of alIgnment and integRation"), is a support vector machine-based machine learning strategy (https://github.com/HuanLab/AVIR). The rationale is that metabolic features with computational variation have a poor correlation between chromatographic peak area and peak height-based quantifications across the samples in a study. AVIR was trained on a set of 696 manually curated metabolic features and achieved an accuracy of 94% in a 10-fold cross-validation. When tested on various external data sets from public metabolomics repositories, AVIR demonstrated an accuracy range of 84%-97%. Finally, tested on a large-scale metabolomics study, AVIR clearly indicated features with computational variation and thus guided us to manually correct them. Our results show that 75.3% of the samples with computational variation had a relative intensity difference of over 20% after correction. This demonstrates the critical role of AVIR in reducing computational variation to improve quantitative certainty in untargeted metabolomics analysis.
Keyphrases
- mass spectrometry
- liquid chromatography
- high resolution
- high resolution mass spectrometry
- gas chromatography
- capillary electrophoresis
- electronic health record
- high performance liquid chromatography
- machine learning
- big data
- tandem mass spectrometry
- simultaneous determination
- clinical trial
- public health
- body mass index
- mental health
- emergency department
- data analysis
- physical activity
- deep learning
- solid phase extraction
- artificial intelligence
- high intensity