Login / Signup

Predictive model of transcriptional elongation control identifies trans regulatory factors from chromatin signatures.

Toray S AkcanSergey VilovMatthias Heinig
Published in: Nucleic acids research (2023)
Promoter-proximal Polymerase II (Pol II) pausing is a key rate-limiting step for gene expression. DNA and RNA-binding trans-acting factors regulating the extent of pausing have been identified. However, we lack a quantitative model of how interactions of these factors determine pausing, therefore the relative importance of implicated factors is unknown. Moreover, previously unknown regulators might exist. Here we address this gap with a machine learning model that accurately predicts the extent of promoter-proximal Pol II pausing from large-scale genome and transcriptome binding maps and gene annotation and sequence composition features. We demonstrate high accuracy and generalizability of the model by validation on an independent cell line which reveals the model's cell line agnostic character. Model interpretation in light of prior knowledge about molecular functions of regulatory factors confirms the interconnection of pausing with other RNA processing steps. Harnessing underlying feature contributions, we assess the relative importance of each factor, quantify their predictive effects and systematically identify previously unknown regulators of pausing. We additionally identify 16 previously unknown 7SK ncRNA interacting RNA-binding proteins predictive of pausing. Our work provides a framework to further our understanding of the regulation of the critical early steps in transcriptional elongation.
Keyphrases
  • gene expression
  • transcription factor
  • machine learning
  • genome wide
  • dna methylation
  • healthcare
  • rna seq
  • high resolution
  • deep learning
  • dna damage
  • artificial intelligence
  • nucleic acid
  • heat stress
  • amino acid