Parsimonious Gene Correlation Network Analysis (PGCNA): a tool to define modular gene co-expression for refined molecular stratification in cancer.
Matthew A CareDavid R WestheadReuben M ToozePublished in: NPJ systems biology and applications (2019)
Cancers converge onto shared patterns that arise from constraints placed by the biology of the originating cell lineage and microenvironment on programs driven by oncogenic events. Here we define consistent expression modules reflecting this structure in colon and breast cancer by exploiting expression data resources and a new computationally efficient approach that we validate against other comparable methods. This approach, Parsimonious Gene Correlation Network Analysis (PGCNA), allows comparison of network structures between these cancer types identifying shared modules of gene co-expression reflecting: cancer hallmarks, functional and structural gene batteries, copy number variation and biology of originating lineage. These networks along with the mapping of outcome data at gene and module level provide an interactive resource that generates context for relationships between genes within and between such modules. Assigning module expression values (MEVs) provides a tool to summarize network level gene expression in individual cases illustrating potential utility in classification and allowing analysis of linkage between module expression and mutational state. Exploiting TCGA data thus defines both recurrent patterns of association between module expression and mutation at data-set level, and exemplifies the polarization of mutation patterns with the leading edge of module expression at individual case level. We illustrate the scalable nature of the approach within immune response related modules, which in the context of breast cancer demonstrates the selective association of immune subsets, in particular mast cells, with the underlying mutational pattern. Together our analyses provide evidence for a generalizable framework to enhance molecular stratification in cancer.
Keyphrases
- copy number
- poor prognosis
- network analysis
- genome wide
- gene expression
- immune response
- binding protein
- mitochondrial dna
- big data
- machine learning
- stem cells
- genome wide identification
- long non coding rna
- public health
- single cell
- high resolution
- mesenchymal stem cells
- inflammatory response
- childhood cancer
- data analysis
- hiv infected
- cell therapy
- drug induced