Integrative Proteogenomics Using ProteomeGenerator2.
Nathaniel KwokZita E H AretzSumiko TakaoZheng SerPaolo CifaniAlex KentsisPublished in: Journal of proteome research (2023)
Recent advances in nucleic acid sequencing now permit rapid and genome-scale analysis of genetic variation and transcription, enabling population-scale studies of human biology, disease, and diverse organisms. Likewise, advances in mass spectrometry proteomics now permit highly sensitive and accurate studies of protein expression at the whole proteome-scale. However, most proteomic studies rely on consensus databases to match spectra to peptide and protein sequences, and thus remain limited to the analysis of canonical protein sequences. Here, we develop ProteomeGenerator2 (PG2), based on the scalable and modular ProteomeGenerator framework. PG2 integrates genome and transcriptome sequencing to incorporate protein variants containing amino acid substitutions, insertions, and deletions, as well as noncanonical reading frames, exons, and other variants caused by genomic and transcriptomic variation. We benchmarked PG2 using synthetic data and genomic, transcriptomic, and proteomic analysis of human leukemia cells. PG2 can be integrated with current and emerging sequencing technologies, assemblers, variant callers, and mass spectral analysis algorithms, and is available open-source from https://github.com/kentsisresearchgroup/ProteomeGenerator2.
Keyphrases
- single cell
- amino acid
- copy number
- mass spectrometry
- rna seq
- endothelial cells
- nucleic acid
- genome wide
- protein protein
- case control
- label free
- induced pluripotent stem cells
- binding protein
- machine learning
- induced apoptosis
- pluripotent stem cells
- high resolution
- acute myeloid leukemia
- big data
- gene expression
- liquid chromatography
- oxidative stress
- dna methylation
- transcription factor
- small molecule
- cell cycle arrest
- computed tomography
- electronic health record
- cell death
- cell proliferation
- high performance liquid chromatography
- signaling pathway
- gas chromatography
- loop mediated isothermal amplification