Login / Signup

Combined direct/indirect detection allows identification of DNA termini in diverse sequencing datasets and supports a multiple-initiation-site model for HIV plus-strand synthesis.

William WangKaren L ArtilesShinichi MachidaMonsef BenkiraneNimit JainAndrew Z Fire
Published in: bioRxiv : the preprint server for biology (2023)
Replication of genetic material involves the creation of characteristic termini. Determining these termini is important to refine our understanding of the mechanisms involved in maintaining the genomes of cellular organisms and viruses. Here we describe a computational approach combining direct and indirect readouts to detect termini from next-generation short-read sequencing. While a direct inference of termini can come from mapping the most prominent start positions of captured DNA fragments, this approach is insufficient in cases where the DNA termini are not captured, whether for biological or technical reasons. Thus, a complementary (indirect) approach to terminus detection can be applied, taking advantage of the imbalance in coverage between forward and reverse sequence reads near termini. A resulting metric ("strand bias") can be used to detect termini even where termini are naturally blocked from capture or ends are not captured during library preparation (e.g., in tagmentation-based protocols). Applying this analysis to datasets where known DNA termini are present, such as from linear double-stranded viral genomes, yielded distinct strand bias signals corresponding to these termini. To evaluate the potential to analyze a more complex situation, we applied the analysis to examine DNA termini present early after HIV infection in a cell culture model. We observed both the known termini expected based on standard models of HIV reverse transcription (the U5-right-end and U3-left-end termini) as well as a signal corresponding to a previously described additional initiation site for plus-strand synthesis (cPPT [central polypurine tract]). Interestingly, we also detected putative terminus signals at additional sites. The strongest of these are a set that share several characteristics with the previously characterized plus-strand initiation sites (the cPPT and 3' PPT [polypurine tract] sites): (i) an observed spike in directly captured cDNA ends, an indirect terminus signal evident in localized strand bias, (iii) a preference for location on the plus-strand, (iv) an upstream purine-rich motif, and (v) a decrease in terminus signal at late time points after infection. These characteristics are consistent in duplicate samples in two different genotypes (wild type and integrase-lacking HIV). The observation of distinct internal termini associated with multiple purine-rich regions raises a possibility that multiple internal initiations of plus-strand synthesis might contribute to HIV replication.
Keyphrases