Login / Signup

Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation.

Adam Y HeCharles G Danko
Published in: bioRxiv : the preprint server for biology (2024)
Our understanding of how the DNA sequences of cis -regulatory elements encode transcription initiation patterns remains limited. Here we introduce CLIPNET, a deep learning model trained on population-scale PRO-cap data that accurately predicts the position and quantity of transcription initiation with single nucleotide resolution from DNA sequence. Interpretation of CLIPNET revealed a complex regulatory syntax consisting of DNA-protein interactions in five major positions between - 200 and +50 bp relative to the transcription start site, as well as more subtle positional preferences among different transcriptional activators. Transcriptional activator and core promoter motifs occupy different positions and play distinct roles in regulating initiation, with the former driving initiation quantity and the latter initiation position. We identified core promoter motifs that explain initiation patterns in the majority of promoters and enhancers, including DPR motifs and AT-rich TBP binding sequences in TATA-less promoters. Our results provide insights into the sequence architecture governing transcription initiation.
Keyphrases
  • transcription factor
  • gene expression
  • single molecule
  • dna methylation
  • circulating tumor
  • dna binding
  • cell free
  • artificial intelligence
  • big data
  • anti inflammatory
  • inflammatory response
  • convolutional neural network