Fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution.
Yuanfang GuanYuanfang GuanPublished in: Genome research (2021)
Decoding the cell type-specific transcription factor (TF) binding landscape at single-nucleotide resolution is crucial for understanding the regulatory mechanisms underlying many fundamental biological processes and human diseases. However, limits on time and resources restrict the high-resolution experimental measurements of TF binding profiles of all possible TF-cell type combinations. Previous computational approaches either cannot distinguish the cell context-dependent TF binding profiles across diverse cell types or can only provide a relatively low-resolution prediction. Here we present a novel deep learning approach, Leopard, for predicting TF binding sites at single-nucleotide resolution, achieving the average area under receiver operating characteristic curve (AUROC) of 0.982 and the average area under precision recall curve (AUPRC) of 0.208. Our method substantially outperformed the state-of-the-art methods Anchor and FactorNet, improving the predictive AUPRC by 19% and 27%, respectively, when evaluated at 200-bp resolution. Meanwhile, by leveraging a many-to-many neural network architecture, Leopard features a hundredfold to thousandfold speedup compared with current many-to-one machine learning methods.
Keyphrases
- transcription factor
- dna binding
- single cell
- single molecule
- machine learning
- deep learning
- high resolution
- neural network
- cell therapy
- endothelial cells
- binding protein
- artificial intelligence
- stem cells
- convolutional neural network
- tandem mass spectrometry
- induced pluripotent stem cells
- high speed
- genome wide identification
- pluripotent stem cells