Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification.
Yishai ShimoniPublished in: PLoS computational biology (2018)
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.
Keyphrases
- genome wide
- genome wide identification
- papillary thyroid
- squamous cell
- gene expression
- dna methylation
- copy number
- machine learning
- childhood cancer
- signaling pathway
- magnetic resonance imaging
- randomized controlled trial
- deep learning
- transcription factor
- lymph node metastasis
- computed tomography
- squamous cell carcinoma
- long non coding rna
- public health
- case report
- free survival
- solid state