Integrated meta-analysis of colorectal cancer public proteomic datasets for biomarker discovery and validation.
Javier RoblesAnanth PrakashJuan Antonio VizcainoJ Ignacio CasalPublished in: PLoS computational biology (2024)
The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused. Here, we attempted to reuse public proteomics datasets with two main objectives: i) to generate hypotheses (detection of biomarkers) for their posterior/downstream validation, and (ii) to validate, using an orthogonal approach, a previously described biomarker panel. Twelve CRC public proteomics datasets (mostly from the PRIDE database) were re-analysed and integrated to create a landscape of protein expression. Samples from both solid and liquid biopsies were included in the reanalysis. Integrating this data with survival annotation data, we have validated in silico a six-gene signature for CRC classification at the protein level, and identified five new blood-detectable biomarkers (CD14, PPIA, MRC2, PRDX1, and TXNDC5) associated with CRC prognosis. The prognostic value of these blood-derived proteins was confirmed using additional public datasets, supporting their potential clinical value. As a conclusion, this proof-of-the-concept study demonstrates the value of re-using public proteomics datasets as the basis to create a useful resource for biomarker discovery and validation. The protein expression data has been made available in the public resource Expression Atlas.
Keyphrases
- healthcare
- mental health
- electronic health record
- label free
- big data
- rna seq
- mass spectrometry
- systematic review
- single cell
- machine learning
- adverse drug
- small molecule
- poor prognosis
- deep learning
- high throughput
- risk assessment
- wastewater treatment
- ionic liquid
- working memory
- dna methylation
- transcription factor
- quantum dots
- artificial intelligence
- young adults
- protein protein
- nk cells