Login / Signup

An advanced sequence clustering and designation workflow reveals the enzootic maintenance of a dominant West Nile virus subclade in Germany.

Pauline Dianne SantosAnne GüntherMarkus KellerTimo Homeier-BachmannMartin H GroschupMartin BeerDirk HöperUte Ziegler
Published in: Virus evolution (2023)
West Nile virus (WNV) is the most widespread arthropod-borne (arbo) virus and the primary cause of arboviral encephalitis globally. Members of WNV species genetically diverged and are classified into different hierarchical groups below species rank. However, the demarcation criteria for allocating WNV sequences into these groups remain individual and inconsistent, and the use of names for different levels of the hierarchical levels is unstructured. In order to have an objective and comprehensible grouping of WNV sequences, we developed an advanced grouping workflow using the 'affinity propagation clustering' algorithm and newly included the 'agglomerative hierarchical clustering' algorithm for the allocation of WNV sequences into different groups below species rank. In addition, we propose to use a fixed set of terms for the hierarchical naming of WNV below species level and a clear decimal numbering system to label the determined groups. For validation, we applied the refined workflow to WNV sequences that have been previously grouped into various lineages, clades, and clusters in other studies. Although our workflow regrouped some WNV sequences, overall, it generally corresponds with previous groupings. We employed our novel approach to the sequences from the WNV circulation in Germany 2020, primarily from WNV-infected birds and horses. Besides two newly defined minor (sub)clusters comprising only three sequences each, Subcluster 2.5.3.4.3c was the predominant WNV sequence group detected in Germany from 2018 to 2020. This predominant subcluster was also associated with at least five human WNV infections in 2019-20. In summary, our analyses imply that the genetic diversity of the WNV population in Germany is shaped by enzootic maintenance of the dominant WNV subcluster accompanied by sporadic incursions of other rare clusters and subclusters. Moreover, we show that our refined approach for sequence grouping yields meaningful results. Although we primarily aimed at a more detailed WNV classification, the presented workflow can also be applied to the objective genotyping of other virus species.
Keyphrases
  • genetic diversity
  • machine learning
  • deep learning
  • electronic health record
  • high throughput
  • rna seq
  • pluripotent stem cells