Molecular phenotyping using networks, diffusion, and topology: soft tissue sarcoma.
James C MathewsMaryam PouryahyaCaroline MoosmüllerYannis G KevrekidisJoseph O DeasyAllen TannenbaumPublished in: Scientific reports (2019)
Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
Keyphrases
- genome wide
- data analysis
- rna seq
- genome wide identification
- single cell
- copy number
- dna methylation
- machine learning
- genome wide analysis
- high throughput
- gene expression
- squamous cell carcinoma
- magnetic resonance imaging
- deep learning
- electronic health record
- young adults
- computed tomography
- optical coherence tomography
- single molecule
- binding protein
- big data
- squamous cell
- small molecule
- high grade
- protein kinase
- bioinformatics analysis