A data-driven, knowledge-based approach to biomarker discovery: application to circulating microRNA markers of colorectal cancer prognosis.
Fatemeh VafaeeConnie DiakosMichaela B KirschnerGlen ReidMichael Z MichaelLisa G HorvathHamid Alinejad-RoknyZhangkai Jason ChengZdenka KuncicStephen ClarkePublished in: NPJ systems biology and applications (2018)
Recent advances in high-throughput technologies have provided an unprecedented opportunity to identify molecular markers of disease processes. This plethora of complex-omics data has simultaneously complicated the problem of extracting meaningful molecular signatures and opened up new opportunities for more sophisticated integrative and holistic approaches. In this era, effective integration of data-driven and knowledge-based approaches for biomarker identification has been recognised as key to improving the identification of high-performance biomarkers, and necessary for translational applications. Here, we have evaluated the role of circulating microRNA as a means of predicting the prognosis of patients with colorectal cancer, which is the second leading cause of cancer-related death worldwide. We have developed a multi-objective optimisation method that effectively integrates a data-driven approach with the knowledge obtained from the microRNA-mediated regulatory network to identify robust plasma microRNA signatures which are reliable in terms of predictive power as well as functional relevance. The proposed multi-objective framework has the capacity to adjust for conflicting biomarker objectives and to incorporate heterogeneous information facilitating systems approaches to biomarker discovery. We have found a prognostic signature of colorectal cancer comprising 11 circulating microRNAs. The identified signature predicts the patients' survival outcome and targets pathways underlying colorectal cancer progression. The altered expression of the identified microRNAs was confirmed in an independent public data set of plasma samples of patients in early stage vs advanced colorectal cancer. Furthermore, the generality of the proposed method was demonstrated across three publicly available miRNA data sets associated with biomarker studies in other diseases.
Keyphrases
- high throughput
- end stage renal disease
- healthcare
- early stage
- ejection fraction
- chronic kidney disease
- peritoneal dialysis
- poor prognosis
- big data
- small molecule
- prognostic factors
- mental health
- squamous cell carcinoma
- machine learning
- single cell
- artificial intelligence
- transcription factor
- emergency department
- health information
- deep learning
- network analysis