Imputation for transcription factor binding predictions based on deep learning.
Qian QinJianxing FengPublished in: PLoS computational biology (2017)
Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding.
Keyphrases
- single cell
- transcription factor
- high throughput
- genome wide
- rna seq
- electronic health record
- dna binding
- circulating tumor cells
- big data
- deep learning
- healthcare
- neural network
- machine learning
- binding protein
- dna methylation
- mental health
- gene expression
- mesenchymal stem cells
- social media
- quality improvement
- convolutional neural network
- bone marrow