Ribovirus classification by a polymerase barcode sequence.
Artem BabaianRobert C EdgarPublished in: PeerJ (2022)
RNA viruses encoding a polymerase gene (riboviruses) dominate the known eukaryotic virome. High-throughput sequencing is revealing a wealth of new riboviruses known only from sequence, precluding classification by traditional taxonomic methods. Sequence classification is often based on polymerase sequences, but standardised methods to support this approach are currently lacking. To address this need, we describe the polymerase palmprint, a segment of the palm sub-domain robustly delineated by well-conserved catalytic motifs. We present an algorithm, Palmscan, which identifies palmprints in nucleotide and amino acid sequences; PALMdb, a collection of palmprints derived from public sequence databases; and palmID, a public website implementing palmprint identification, search, and annotation. Together, these methods demonstrate a proof-of-concept workflow for high-throughput characterisation of RNA viruses, paving the path for the continued rapid growth in RNA virus discovery anticipated in the coming decade.
Keyphrases
- amino acid
- deep learning
- machine learning
- high throughput
- structural basis
- healthcare
- high throughput sequencing
- genome wide
- mental health
- genetic diversity
- big data
- nucleic acid
- small molecule
- transcription factor
- artificial intelligence
- copy number
- gene expression
- emergency department
- rna seq
- loop mediated isothermal amplification