Login / Signup

Evaluating the Possibility of Detecting Variants in Shotgun Proteomics via LeTE-Fusion Analysis Pipeline.

Tung-Shing Mamie LihWai-Kok ChoongYi-Ju ChenTing-Yi Sung
Published in: Journal of proteome research (2018)
In proteogenomic studies, many genome-annotated events, for example, single amino acid variation (SAAV) and short INDEL, are often unobserved in shotgun proteomics. Therefore, we propose an analysis pipeline called LeTE-fusion (Le, peptide length; T, theoretical values; E, experimental data) to first investigate whether peptides with certain lengths are observed more often in mass spectrometry (MS)-based proteomics, which may hinder peptide identification causing difficulty in detecting genome-annotated events. By applying LeTE-fusion on different MS-based proteome data sets, we found peptides within 7-20 amino acids are more frequently identified, possibly attributed to MS-related factors instead of proteases. We then further extended the usage of LeTE-fusion on four variant-containing-sequence data sets (SAAV-only) with various sample complexity up to the whole human proteome scale, which yields theoretically ∼70% variants observable in an ideal shotgun proteomics. However, only ∼40% of variants might be detectable in real shotgun proteomic experiments when LeTE-fusion utilizes the experimentally observed variant-site-containing wild-type peptides in PeptideAtlas to estimate the expected observable coverage of variants. Finally, we conducted a case study on HEK293 cell line with variants reported at genomic level that were also identified in shotgun proteomics to demonstrate the efficacy of LeTE-fusion on estimating expected observable coverage of variants. To the best of our knowledge, this is the first study to systematically investigate the detection limits of genome-annotated events via shotgun proteomics using such analysis pipeline.
Keyphrases