Physicochemical models of protein-DNA binding with standard and modified base pairs.
Tsu-Pei ChiuSatyanarayan RaoRemo RohsPublished in: Proceedings of the National Academy of Sciences of the United States of America (2023)
DNA-binding proteins play important roles in various cellular processes, but the mechanisms by which proteins recognize genomic target sites remain incompletely understood. Functional groups at the edges of the base pairs (bp) exposed in the DNA grooves represent physicochemical signatures. As these signatures enable proteins to form specific contacts between protein residues and bp, their study can provide mechanistic insights into protein-DNA binding. Existing experimental methods, such as X-ray crystallography, can reveal such mechanisms based on physicochemical interactions between proteins and their DNA target sites. However, the low throughput of structural biology methods limits mechanistic insights for selection of many genomic sites. High-throughput binding assays enable prediction of potential target sites by determining relative binding affinities of a protein to massive numbers of DNA sequences. Many currently available computational methods are based on the sequence of standard Watson-Crick bp. They assume that the contribution of overall binding affinity is independent for each base pair, or alternatively include dinucleotides or short k -mers. These methods cannot directly expand to physicochemical contacts, and they are not suitable to apply to DNA modifications or non-Watson-Crick bp. These variations include DNA methylation, and synthetic or mismatched bp. The proposed method, DeepRec, can predict relative binding affinities as function of physicochemical signatures and the effect of DNA methylation or other chemical modifications on binding. Sequence-based modeling methods are in comparison a coarse-grain description and cannot achieve such insights. Our chemistry-based modeling framework provides a path towards understanding genome function at a mechanistic level.
Keyphrases
- dna binding
- transcription factor
- genome wide
- dna methylation
- circulating tumor
- high throughput
- cell free
- single molecule
- amino acid
- protein protein
- copy number
- binding protein
- gene expression
- nucleic acid
- computed tomography
- magnetic resonance
- circulating tumor cells
- single cell
- climate change
- small molecule
- human health
- respiratory syndrome coronavirus