RoboCOP: jointly computing chromatin occupancy profiles for numerous factors from chromatin accessibility data.
Sneha MitraJianling ZhongTrung Q TranDavid M MacAlpineAlexander J HarteminkPublished in: Nucleic acids research (2021)
Chromatin is a tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and occupancy levels of different transcription factors (TFs) and nucleosomes is therefore crucial to understanding gene regulation. Antibody-based methods for assaying chromatin occupancy are capable of identifying the binding sites of specific DNA binding factors, but only one factor at a time. In contrast, epigenomic accessibility data like MNase-seq, DNase-seq, and ATAC-seq provide insight into the chromatin landscape of all factors bound along the genome, but with little insight into the identities of those factors. Here, we present RoboCOP, a multivariate state space model that integrates chromatin accessibility data with nucleotide sequence to jointly compute genome-wide probabilistic scores of nucleosome and TF occupancy, for hundreds of different factors. We apply RoboCOP to MNase-seq and ATAC-seq data to elucidate the protein-binding landscape of nucleosomes and 150 TFs across the yeast genome, and show that our model makes better predictions than existing methods. We also compute a chromatin occupancy profile of the yeast genome under cadmium stress, revealing chromatin dynamics associated with transcriptional regulation.
Keyphrases
- genome wide
- dna methylation
- gene expression
- transcription factor
- dna binding
- copy number
- single cell
- dna damage
- electronic health record
- big data
- binding protein
- magnetic resonance imaging
- cell free
- small molecule
- heavy metals
- risk assessment
- computed tomography
- mesenchymal stem cells
- machine learning
- stress induced
- heat stress
- contrast enhanced
- cell wall