GlycoSLASH: Concurrent Glycopeptide Identification from Multiple Related LC-MS/MS Data Sets by Using Spectral Clustering and Library Searching.
Sujun LiJianhui ZhuDavid M LubmanHe ZhouHaixu TangPublished in: Journal of proteome research (2023)
Liquid chromatography coupled with tandem mass spectrometry is commonly adopted in large-scale glycoproteomic studies involving hundreds of disease and control samples. The software for glycopeptide identification in such data (e.g., the commercial software Byonic) analyzes the individual data set and does not exploit the redundant spectra of glycopeptides presented in the related data sets. Herein, we present a novel concurrent approach for glycopeptide identification in multiple related glycoproteomic data sets by using spectral clustering and spectral library searching. The evaluation on two large-scale glycoproteomic data sets showed that the concurrent approach can identify 105%-224% more spectra as glycopeptides compared to the glycopeptide identification on individual data sets using Byonic alone. The improvement of glycopeptide identification also enabled the discovery of several potential biomarkers of protein glycosylations in hepatocellular carcinoma patients.
Keyphrases
- electronic health record
- big data
- tandem mass spectrometry
- liquid chromatography
- data analysis
- end stage renal disease
- chronic kidney disease
- small molecule
- magnetic resonance imaging
- squamous cell carcinoma
- bioinformatics analysis
- simultaneous determination
- computed tomography
- artificial intelligence
- prognostic factors
- high performance liquid chromatography
- machine learning
- newly diagnosed
- single cell
- magnetic resonance
- ultra high performance liquid chromatography
- high throughput
- gas chromatography
- molecular dynamics
- protein protein
- solid phase extraction
- rectal cancer