PROSE: phenotype-specific network signatures from individual proteomic samples.
Bertrand Jern Han WongWeijia KongHui PengWilson Wen Bin GohPublished in: Briefings in bioinformatics (2023)
Proteomic studies characterize the protein composition of complex biological samples. Despite recent advancements in mass spectrometry instrumentation and computational tools, low proteome coverage and interpretability remains a challenge. To address this, we developed Proteome Support Vector Enrichment (PROSE), a fast, scalable and lightweight pipeline for scoring proteins based on orthogonal gene co-expression network matrices. PROSE utilizes simple protein lists as input, generating a standard enrichment score for all proteins, including undetected ones. In our benchmark with 7 other candidate prioritization techniques, PROSE shows high accuracy in missing protein prediction, with scores correlating strongly to corresponding gene expression data. As a further proof-of-concept, we applied PROSE to a reanalysis of the Cancer Cell Line Encyclopedia proteomics dataset, where it captures key phenotypic features, including gene dependency. We lastly demonstrated its applicability on a breast cancer clinical dataset, showing clustering by annotated molecular subtype and identification of putative drivers of triple-negative breast cancer. PROSE is available as a user-friendly Python module from https://github.com/bwbio/PROSE.
Keyphrases
- mass spectrometry
- gene expression
- genome wide
- binding protein
- protein protein
- amino acid
- copy number
- dna methylation
- poor prognosis
- label free
- papillary thyroid
- healthcare
- squamous cell carcinoma
- electronic health record
- big data
- single cell
- high resolution
- liquid chromatography
- machine learning
- young adults
- artificial intelligence
- genome wide identification
- long non coding rna
- squamous cell
- transcription factor
- low cost