Login / Signup

The structure of the genetic code as an optimal graph clustering problem.

Paweł BłaŻejDariusz R KowalskiDorota MackiewiczMałgorzata WnetrzakDaniyah A AloqalaaPaweł Mackiewicz
Published in: Journal of mathematical biology (2022)
The standard genetic code (SGC) is the set of rules by which genetic information is translated into proteins, from codons, i.e. triplets of nucleotides, to amino acids. The questions about the origin and the main factor responsible for the present structure of the code are still under a hot debate. Various methodologies have been used to study the features of the code and assess the level of its potential optimality. Here, we introduced a new general approach to evaluate the quality of the genetic code structure. This methodology comes from graph theory and allows us to describe new properties of the genetic code in terms of conductance. This parameter measures the robustness of codon groups against the potential changes in translation of the protein-coding sequences generated by single nucleotide substitutions. We described the genetic code as a partition of an undirected and unweighted graph, which makes the model general and universal. Using this approach, we showed that the structure of the genetic code is a solution to the graph clustering problem. We presented and discussed the structure of the codes that are optimal according to the conductance. Despite the fact that the standard genetic code is far from being optimal according to the conductance, its structure is characterised by many codon groups reaching the minimum conductance for their size. The SGC represents most likely a local minimum in terms of errors occurring in protein-coding sequences and their translation.
Keyphrases
  • genome wide
  • copy number
  • emergency department
  • dna methylation
  • single cell
  • machine learning
  • small molecule
  • binding protein
  • patient safety
  • protein protein