Interpretable deep residual network uncovers nucleosome positioning and associated features.
Yosef Masoudi-SobhanzadehShuxiang LiYunhui PengAnna R PanchenkoPublished in: Nucleic acids research (2024)
Nucleosomes represent elementary building units of eukaryotic chromosomes and consist of DNA wrapped around a histone octamer flanked by linker DNA segments. Nucleosomes are central in epigenetic pathways and their genomic positioning is associated with regulation of gene expression, DNA replication, DNA methylation and DNA repair, among other functions. Building on prior discoveries that DNA sequences noticeably affect nucleosome positioning, our objective is to identify nucleosome positions and related features across entire genome. Here, we introduce an interpretable framework based on the concepts of deep residual networks (NuPoSe). Trained on high-coverage human experimental MNase-seq data, NuPoSe is able to learn sequence and structural patterns associated with nucleosome organization in human genome. NuPoSe can be also applied to unseen data from different organisms and cell types. Our findings point to 43 informative features, most of them constitute tri-nucleotides, di-nucleotides and one tetra-nucleotide. Most features are significantly associated with the nucleosomal structural characteristics, namely, periodicity of nucleosomal DNA and its location with respect to a histone octamer. Importantly, we show that features derived from the 27 bp linker DNA flanking nucleosomes contribute up to 10% to the quality of the prediction model. This, along with the comprehensive training sets, deep-learning architecture, and feature selection method, may contribute to the NuPoSe's 80-89% classification accuracy on different independent datasets.
Keyphrases
- dna methylation
- circulating tumor
- gene expression
- deep learning
- genome wide
- cell free
- single molecule
- dna repair
- endothelial cells
- machine learning
- single cell
- big data
- electronic health record
- copy number
- induced pluripotent stem cells
- rna seq
- healthcare
- circulating tumor cells
- mesenchymal stem cells
- dna damage response
- body composition
- pseudomonas aeruginosa
- escherichia coli
- stem cells
- biofilm formation
- affordable care act