An Integrated Mass Spectroscopy Data Processing Strategy for Fast Identification, In-Depth, and Reproducible Quantification of Protein O-Glycosylation in a Large Cohort of Human Urine Samples.
Xinyuan ZhaoShanshan ZhengYuanyuan LiJunjie HuangWanjun ZhangYuping XieWeijie QinXiaohong QianPublished in: Analytical chemistry (2019)
Protein O-glycosylation has long been recognized to be closely associated with many diseases, particularly with tumor proliferation, invasion, and metastasis. The ability to efficiently profile the variation of O-glycosylation in large-scale clinical samples provides an important approach for the development of biomarkers for cancer diagnosis and for therapeutic response evaluation. Therefore, mass spectrometry (MS)-based techniques for high throughput, in-depth and reliable elucidation of protein O-glycosylation in large clinical cohorts are in high demand. However, the wide existence of serine and threonine residues in the proteome and the tens of mammalian O-glycan types lead to extremely large searching space composed of millions of theoretical combinations of peptides and O-glycans for intact O-glycopeptide database searching. As a result, an exceptionally long time is required for database searching, which is a major obstacle in O-glycoproteome studies of large clinical cohorts. More importantly, because of the low abundance and poor ionization of intact O-glycopeptides and the stochastic nature of data-dependent MS2 acquisition, substantially elevated missing data levels are inevitable as the sample number increases, which undermines the quantitative comparison across samples. Therefore, we report a new MS data processing strategy that integrates glycoform-specific database searching, reference library-based MS1 feature matching and MS2 identification propagation for fast identification, in-depth, and reproducible label-free quantification of O-glycosylation of human urinary proteins. This strategy increases the database searching speeds by up to 20-fold and leads to a 30%-40% enhanced intact O-glycopeptide quantification in individual samples with an obviously improved reproducibility. In total, we identified 1300 intact O-glycopeptides in 36 healthy human urine samples with a 30%-40% reduction in the amount of missing data. This is currently the largest dataset of urinary O-glycoproteome and demonstrates the application potential of this new strategy in large-scale clinical investigations.
Keyphrases
- mass spectrometry
- multiple sclerosis
- electronic health record
- endothelial cells
- ms ms
- high throughput
- big data
- high resolution
- liquid chromatography
- label free
- gas chromatography
- optical coherence tomography
- adverse drug
- high performance liquid chromatography
- machine learning
- squamous cell carcinoma
- pluripotent stem cells
- data analysis
- risk assessment
- cell migration
- wastewater treatment
- protein kinase
- papillary thyroid
- antibiotic resistance genes
- cell surface
- solid state