Login / Signup

Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences.

Matthew J McGuffieJeffrey E Barrick
Published in: bioRxiv : the preprint server for biology (2023)
Plasmids are used in molecular biology and biotechnology for a wide variety of tasks such as cloning DNA, expressing recombinant proteins, and creating vaccines. One challenge in working with plasmids is that there has been a long, and often lost history of pieces of plasmids being copied and remixed by researchers to create new plasmids. Current databases used for annotating key genetic parts in plasmids are incomplete, especially with respect to cataloguing closely related versions of parts that can have very different characteristics. Some genetic part variants have arisen due to purposeful editing while others are the result of unplanned mutations and evolution. When a researcher finds differences between a database sequence and a genetic part in their newly constructed plasmid, it is often unclear how and when it arose and whether it will affect their experiments. We identified 217 genetic part variants that are either widespread or have likely arisen independently more than once on plasmids due to convergent evolution or engineering. We propose that these variants should be prioritized for inclusion in curated databases of engineered DNA sequences and for functional characterization to improve the reliability and reproducibility of science.
Keyphrases
  • copy number
  • escherichia coli
  • klebsiella pneumoniae
  • genome wide
  • cell free
  • circulating tumor
  • single molecule
  • crispr cas
  • dna methylation
  • gene expression
  • big data
  • multidrug resistant
  • machine learning