A Unified Deep Learning Framework for Single-Cell ATAC-Seq Analysis Based on ProdDep Transformer Encoder.
Zixuan WangYongqing ZhangYun YuJunming ZhangYuhang LiuGuishen WangPublished in: International journal of molecular sciences (2023)
Recent advances in single-cell sequencing assays for the transposase-accessibility chromatin (scATAC-seq) technique have provided cell-specific chromatin accessibility landscapes of cis-regulatory elements, providing deeper insights into cellular states and dynamics. However, few research efforts have been dedicated to modeling the relationship between regulatory grammars and single-cell chromatin accessibility and incorporating different analysis scenarios of scATAC-seq data into the general framework. To this end, we propose a unified deep learning framework based on the ProdDep Transformer Encoder, dubbed PROTRAIT, for scATAC-seq data analysis. Specifically motivated by the deep language model, PROTRAIT leverages the ProdDep Transformer Encoder to capture the syntax of transcription factor (TF)-DNA binding motifs from scATAC-seq peaks for predicting single-cell chromatin accessibility and learning single-cell embedding. Based on cell embedding, PROTRAIT annotates cell types using the Louvain algorithm. Furthermore, according to the identified likely noises of raw scATAC-seq data, PROTRAIT denoises these values based on predated chromatin accessibility. In addition, PROTRAIT employs differential accessibility analysis to infer TF activity at single-cell and single-nucleotide resolution. Extensive experiments based on the Buenrostro2018 dataset validate the effeteness of PROTRAIT for chromatin accessibility prediction, cell type annotation, and scATAC-seq data denoising, therein outperforming current approaches in terms of different evaluation metrics. Besides, we confirm the consistency between the inferred TF activity and the literature review. We also demonstrate the scalability of PROTRAIT to analyze datasets containing over one million cells.
Keyphrases
- single cell
- rna seq
- transcription factor
- dna binding
- high throughput
- deep learning
- data analysis
- genome wide
- gene expression
- dna damage
- machine learning
- electronic health record
- big data
- induced apoptosis
- dna methylation
- mesenchymal stem cells
- artificial intelligence
- climate change
- stem cells
- case report
- endoplasmic reticulum stress