The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics-Mass INTerpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry data, similar to the large language models. Models are trained on user-provided reference MS/MS libraries via any customizable molecular fingerprint descriptors. IDSL_MINT was benchmarked using the LipidMaps database and improved the annotation rate of a test study for MS/MS spectra that were not originally annotated using existing mass spectral libraries. IDSL_MINT may improve the overall annotation rates in untargeted metabolomics and exposomics studies. The IDSL_MINT framework and tutorials are available in the GitHub repository at https://github.com/idslme/IDSL_MINT .Scientific contribution statement.Structural annotation of MS/MS spectra from untargeted metabolomics and exposomics datasets is a major bottleneck in gaining new biological insights. Machine learning models to convert spectra into molecular fingerprints can help in the annotation process. Here, we present IDSL_MINT, a new, easy-to-use and customizable deep-learning framework to train and utilize new models to predict molecular fingerprints from spectra for the compound annotation workflows.
Keyphrases
- mass spectrometry
- ms ms
- liquid chromatography
- high performance liquid chromatography
- deep learning
- tandem mass spectrometry
- gas chromatography
- ultra high performance liquid chromatography
- density functional theory
- machine learning
- rna seq
- high resolution mass spectrometry
- liquid chromatography tandem mass spectrometry
- high resolution
- simultaneous determination
- capillary electrophoresis
- artificial intelligence
- convolutional neural network
- single molecule
- autism spectrum disorder
- single cell
- solid phase extraction
- public health
- gas chromatography mass spectrometry
- molecular dynamics
- body composition
- case control
- optical coherence tomography
- quality control