Interpreting and de-noising genetically engineered barcodes in a DNA virus.
Sylvain BloisBenjamin M GoetzJames J BullChristopher S SullivanPublished in: PLoS computational biology (2022)
The concept of a nucleic acid barcode applied to pathogen genomes is easy to grasp and the many possible uses are straightforward. But implementation may not be easy, especially when growing through multiple generations or assaying the pathogen long-term. The potential problems include: the barcode might alter fitness, the barcode may accumulate mutations, and construction of the marked pathogens may result in unintended barcodes that are not as designed. Here, we generate approximately 5,000 randomized barcodes in the genome of the prototypic small DNA virus murine polyomavirus. We describe the challenges faced with interpreting the barcode sequences obtained from the library. Our Illumina NextSeq sequencing recalled much greater variation in barcode sequencing reads than the expected 5,000 barcodes-necessarily stemming from the Illumina library processing and sequencing error. Using data from defined control virus genomes cloned into plasmid backbones we develop a vetted post-sequencing method to cluster the erroneous reads around the true virus genome barcodes. These findings may foreshadow problems with randomized barcodes in other microbial systems and provide a useful approach for future work utilizing nucleic acid barcoded pathogens.
Keyphrases
- nucleic acid
- single cell
- mental health
- double blind
- open label
- phase iii
- healthcare
- primary care
- placebo controlled
- escherichia coli
- phase ii
- crispr cas
- genome wide
- microbial community
- clinical trial
- physical activity
- randomized controlled trial
- gene expression
- gram negative
- dna methylation
- machine learning
- circulating tumor
- antimicrobial resistance
- big data
- multidrug resistant