Multi-Reference Spectral Library Yields Almost Complete Coverage of Heterogeneous LC-MS/MS Data Sets.
Constantin AmmarEvi BerchtoldGergely CsabaAndreas SchmidtAxel ImhofRalf ZimmerPublished in: Journal of proteome research (2019)
Spectral libraries play a central role in the analysis of data-independent-acquisition (DIA) proteomics experiments. A main assumption in current spectral library tools is that a single characteristic intensity pattern (CIP) suffices to describe the fragmentation of a peptide in a particular charge state (peptide charge pair). However, we find that this is often not the case. We carry out a systematic evaluation of spectral variability over public repositories and in-house data sets. We show that spectral variability is widespread and partly occurs under fixed experimental conditions. Using clustering of preprocessed spectra, we derive a limited number of multiple characteristic intensity patterns (MCIPs) for each peptide charge pair, which allow almost complete coverage of our heterogeneous data set without affecting the false discovery rate. We show that a MCIP library derived from public repositories performs in most cases similar to a "custom-made" spectral library, which has been acquired under identical experimental conditions as the query spectra. We apply the MCIP approach to a DIA data set and observe a significant increase in peptide recognition. We propose the MCIP approach as an easy-to-implement addition to current spectral library search engines and as a new way to utilize the data stored in spectral repositories.
Keyphrases
- optical coherence tomography
- electronic health record
- big data
- dual energy
- healthcare
- mental health
- magnetic resonance imaging
- emergency department
- small molecule
- magnetic resonance
- mass spectrometry
- computed tomography
- data analysis
- machine learning
- high intensity
- high throughput
- molecular dynamics
- room temperature