Deep learning-based cell composition analysis from tissue expression profiles.
Kevin MendenMohamed MaroufSergio Oller MorenoAnupriya DalmiaDaniel Sumner MagruderKarin KloiberPeter HeutinkStefan BonnPublished in: Science advances (2020)
We present Scaden, a deep neural network for cell deconvolution that uses gene expression information to infer the cellular composition of tissues. Scaden is trained on single-cell RNA sequencing (RNA-seq) data to engineer discriminative features that confer robustness to bias and noise, making complex data preprocessing and feature selection unnecessary. We demonstrate that Scaden outperforms existing deconvolution algorithms in both precision and robustness. A single trained network reliably deconvolves bulk RNA-seq and microarray, human and mouse tissue expression data and leverages the combined information of multiple datasets. Because of this stability and flexibility, we surmise that deep learning will become an algorithmic mainstay for cell deconvolution of various data types. Scaden's software package and web application are easy to use on new as well as diverse existing expression datasets available in public resources, deepening the molecular and cellular understanding of developmental and disease processes.
Keyphrases
- single cell
- rna seq
- deep learning
- gene expression
- high throughput
- electronic health record
- machine learning
- neural network
- big data
- poor prognosis
- healthcare
- artificial intelligence
- endothelial cells
- data analysis
- emergency department
- dna methylation
- convolutional neural network
- mental health
- air pollution
- health information
- mesenchymal stem cells
- long non coding rna
- bone marrow
- resistance training
- social media
- network analysis
- pluripotent stem cells