Login / Signup

Identification of RNA virus-derived RdRp sequences in publicly available transcriptomic datasets.

Ingrida OlendraiteKatherine BrownAndrew E Firth
Published in: Molecular biology and evolution (2023)
RNA viruses are abundant, highly diverse, and infect all or most eukaryotic organisms. However, only a tiny fraction of the number and diversity of RNA virus species have been catalogued. To cost effectively expand the diversity of known RNA virus sequences we mined publicly available transcriptomic datasets. We developed 77 family-level Hidden Markov Model profiles for the viral RNA dependent RNA polymerase - the only universal "hall-mark" gene of RNA viruses. By using these to search the NCBI Transcriptome Shotgun Assembly database, we identified 5,867 contigs encoding RNA virus RdRps or fragments thereof and analysed their diversity, taxonomic classification, phylogeny and host associations. Our study expands the known diversity of RNA viruses, and the 77 curated RdRp pHMMs provide a useful resource for the virus discovery community.
Keyphrases
  • rna seq
  • nucleic acid
  • healthcare
  • gene expression
  • machine learning
  • genome wide
  • small molecule
  • emergency department
  • high throughput
  • genetic diversity
  • sars cov
  • multidrug resistant
  • deep learning