Escherichia coli transcription factors of unknown function: sequence features and possible evolutionary relationships.
Isabel Duarte-VelázquezJavier de la MoraJorge Humberto Ramírez-PradoAlondra Aguillón-BárcenasFátima Tornero-GutiérrezEugenia Cordero-LoretoFernando Anaya-VelázquezItzel Páramo-PérezÁngeles Rangel-SerranoSergio Rodrigo Muñoz-CarranzaOscar Eduardo Romero-GonzálezLuis Rafael Cardoso-ReyesRicardo Alberto Rodríguez-OjedaHéctor Manuel Mora-MontesNaurú Idalia Vargas-MayaFelipe Padilla-VacaBernardo FrancoPublished in: PeerJ (2022)
Organisms need mechanisms to perceive the environment and respond accordingly to environmental changes or the presence of hazards. Transcription factors (TFs) are required for cells to respond to the environment by controlling the expression of genes needed. Escherichia coli has been the model bacterium for many decades, and still, there are features embedded in its genome that remain unstudied. To date, 58 TFs remain poorly characterized, although their binding sites have been experimentally determined. This study showed that these TFs have sequence variation at the third codon position G+C content but maintain the same Codon Adaptation Index (CAI) trend as annotated functional transcription factors. Most of these transcription factors are in areas of the genome where abundant repetitive and mobile elements are present. Sequence divergence points to groups with distinctive sequence signatures but maintaining the same type of DNA binding domain. Finally, the analysis of the promoter sequences of the 58 TFs showed A+T rich regions that agree with the features of horizontally transferred genes. The findings reported here pave the way for future research of these TFs that may uncover their role as spare factors in case of lose-of-function mutations in core TFs and trace back their evolutionary history.
Keyphrases
- transcription factor
- dna binding
- genome wide
- genome wide identification
- escherichia coli
- dna methylation
- poor prognosis
- induced apoptosis
- amino acid
- gene expression
- high frequency
- pseudomonas aeruginosa
- current status
- klebsiella pneumoniae
- heavy metals
- cystic fibrosis
- risk assessment
- oxidative stress
- binding protein
- staphylococcus aureus
- climate change
- candida albicans