scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting.
Pavel AkhtyamovLayal ShaheenMikhail RaevskiyAlexey StupnikovYulia A MedvedevaPublished in: Briefings in bioinformatics (2023)
Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https://github.com/lab-medvedeva/SAPIEnS. SAPIEnS database is available at: https://sapiensdb.com.
Keyphrases
- single cell
- rna seq
- transcription factor
- genome wide
- high throughput
- minimally invasive
- magnetic resonance imaging
- randomized controlled trial
- systematic review
- electronic health record
- big data
- dna binding
- magnetic resonance
- machine learning
- working memory
- data analysis
- adverse drug
- emergency department
- deep learning