Login / Signup

Data-Driven Tool for Cross-Run Ion Selection and Peak-Picking in Quantitative Proteomics with Data-Independent Acquisition LC-MS/MS.

Binjun YanMengtian ShiSiyu CaiYuan SuRenhui ChenChiyuan HuangDavid Da Yong Chen
Published in: Analytical chemistry (2023)
Proteomics provides molecular bases of biology and disease, and liquid chromatography-tandem mass spectrometry (LC-MS/MS) is a platform widely used for bottom-up proteomics. Data-independent acquisition (DIA) improves the run-to-run reproducibility of LC-MS/MS in proteomics research. However, the existing DIA data processing tools sometimes produce large deviations from true values for the peptides and proteins in quantification. Peak-picking error and incorrect ion selection are the two main causes of the deviations. We present a cross-run ion selection and peak-picking (CRISP) tool that utilizes the important advantage of run-to-run consistency of DIA and simultaneously examines the DIA data from the whole set of runs to filter out the interfering signals, instead of only looking at a single run at a time. Eight datasets acquired by mass spectrometers from different vendors with different types of mass analyzers were used to benchmark our CRISP-DIA against other currently available DIA tools. In the benchmark datasets, for analytes with large content variation among samples, CRISP-DIA generally resulted in 20 to 50% relative decrease in error rates compared to other DIA tools, at both the peptide precursor level and the protein level. CRISP-DIA detected differentially expressed proteins more efficiently, with 3.3 to 90.3% increases in the numbers of true positives and 12.3 to 35.3% decreases in the false positive rates, in some cases. In the real biological datasets, CRISP-DIA showed better consistencies of the quantification results. The advantages of assimilating DIA data in multiple runs for quantitative proteomics were demonstrated, which can significantly improve the quantification accuracy.
Keyphrases
  • electronic health record
  • mass spectrometry
  • liquid chromatography tandem mass spectrometry
  • big data
  • label free
  • high resolution
  • ms ms
  • data analysis
  • deep learning
  • high throughput
  • protein protein