Predicting the DNA binding specificity of mutated transcription factors using family-level biophysically interpretable machine learning.
Shaoxun LiuPilar Gomez-AlcalaChrist LeemansWilliam J GlassfordRichard S MannHarmen J BussemakerPublished in: bioRxiv : the preprint server for biology (2024)
Transcription factors (TFs) are DNA binding proteins that play a key role in gene expression control. Genetic mutations in the protein sequence of TFs are increasingly found to be associated with disease. Being able to predict the functional impact of such mutations in terms the quantitative changes in DNA sequence preference they cause is therefore highly useful. TFs come in families that are structurally similar but vary in terms of their sequence and function. In this study, we show that by jointly analyzing high-throughput DNA binding data for the basic helix-loop-helix (bHLH) family of transcription factors, we can successfully build a model that predicts the impact of TF protein sequence mutations.