Choice of High-Throughput Proteomics Method Affects Data Integration with Transcriptomics and the Potential Use in Biomarker Discovery.
Sergio Mosquim JuniorValentina SiinoLisa RydénJohan Vallon-ChristerssonFredrik LevanderPublished in: Cancers (2022)
In recent years, several advances have been achieved in breast cancer (BC) classification and treatment. However, overdiagnosis, overtreatment, and recurrent disease are still significant causes of complication and death. Here, we present the development of a protocol aimed at parallel transcriptome and proteome analysis of BC tissue samples using mass spectrometry, via Data Dependent and Independent Acquisitions (DDA and DIA). Protein digestion was semi-automated and performed on flowthroughs after RNA extraction. Data for 116 samples were acquired in DDA and DIA modes and processed using MaxQuant, EncyclopeDIA, or DIA-NN. DIA-NN showed an increased number of identified proteins, reproducibility, and correlation with matching RNA-seq data, therefore representing the best alternative for this setup. Gene Set Enrichment Analysis pointed towards complementary information being found between transcriptomic and proteomic data. A decision tree model, designed to predict the intrinsic subtypes based on differentially abundant proteins across different conditions, selected protein groups that recapitulate important clinical features, such as estrogen receptor status, HER2 status, proliferation, and aggressiveness. Taken together, our results indicate that the proposed protocol performed well for the application. Additionally, the relevance of the selected proteins points to the possibility of using such data as a biomarker discovery tool for personalized medicine.