IDSL.UFA Assigns High-Confidence Molecular Formula Annotations for Untargeted LC/HRMS Data Sets in Metabolomics and Exposomics.
Sadjad Fakouri BaygiSanjay K BanerjeePraloy ChakrabortyYashwant KumarDinesh Kumar BarupalPublished in: Analytical chemistry (2022)
Untargeted liquid chromatography/high-resolution mass spectrometry (LC/HRMS) assays in metabolomics and exposomics aim to characterize the small molecule chemical space in a biospecimen. To gain maximum biological insights from these data sets, LC/HRMS peaks should be annotated with chemical and functional information including molecular formula, structure, chemical class, and metabolic pathways. Among these, molecular formulas may be assigned to LC/HRMS peaks through matching theoretical and observed isotopic profiles (MS1) of the underlying ionized compound. For this, we have developed the Integrated Data Science Laboratory for Metabolomics and Exposomics-United Formula Annotation (IDSL.UFA) R package. In the untargeted metabolomics validation tests, IDSL.UFA assigned 54.31-85.51% molecular formula for true positive annotations as the top hit and 90.58-100% within the top five hits. Molecular formula annotations were also supported by tandem mass spectrometry data. We have implemented new strategies to (1) generate formula sources and their theoretical isotopic profiles, (2) optimize the formula hits ranking for the individual and aligned peak lists, and (3) scale IDSL.UFA-based workflows for studies with larger sample sizes. Annotating the raw data for a publicly available pregnancy metabolome study using IDSL.UFA highlighted hundreds of new pregnancy-related compounds and also suggested the presence of chlorinated perfluorotriether alcohols (Cl-PFTrEAs) in human specimens. IDSL.UFA is useful for human metabolomics and exposomics studies where we need to minimize the loss of biological insights in untargeted LC/HRMS data sets. The IDSL.UFA package is available in the R CRAN repository https://cran.r-project.org/package=IDSL.UFA. Detailed documentation and tutorials are also provided at www.ufa.idsl.me.
Keyphrases
- high resolution mass spectrometry
- liquid chromatography
- mass spectrometry
- tandem mass spectrometry
- ultra high performance liquid chromatography
- gas chromatography
- simultaneous determination
- electronic health record
- high performance liquid chromatography
- solid phase extraction
- human milk
- small molecule
- high resolution
- big data
- endothelial cells
- public health
- single molecule
- data analysis
- preterm birth
- healthcare
- machine learning
- multiple sclerosis
- low birth weight