Unprecedented genetic variability of PFam54 paralogs among Eurasian Lyme borreliosis-causing spirochetes.
Janna WülbernLaura WindorferKozue SatoMinoru NakaoSabrina HepnerGabriele MargosVolker FingerleHiroki KawabataNoémie S BeckerPeter KraiczyRobert Ethan RollinsPublished in: Ecology and evolution (2024)
Lyme borreliosis (LB) is the most common vector-borne disease in the Northern Hemisphere caused by spirochetes belonging to the Borrelia burgdorferi sensu lato ( Bb sl) complex. Borrelia spirochetes circulate in obligatory transmission cycles between tick vectors and different vertebrate hosts. To successfully complete this complex transmission cycle, Bb sl encodes for an arsenal of proteins including the PFam54 protein family with known, or proposed, influences to reservoir host and/or vector adaptation. Even so, only fragmentary information is available regarding the naturally occurring level of variation in the PFam54 gene array especially in relation to Eurasian-distributed species. Utilizing whole genome data from isolates ( n = 141) originated from three major LB-causing Borrelia species across Eurasia ( B. afzelii , B. bavariensis , and B. garinii ), we aimed to characterize the diversity of the PFam54 gene array in these isolates to facilitate understanding the evolution of PFam54 paralogs on an intra- and interspecies level. We found an extraordinarily high level of variation in the PFam54 gene array with 39 PFam54 paralogs belonging to 23 orthologous groups including five novel paralogs. Even so, the gene array appears to have remained fairly stable over the evolutionary history of the studied Borrelia species. Interestingly, genes outside Clade IV, which contains genes encoding for proteins associated with Borrelia pathogenesis, more frequently displayed signatures of diversifying selection between clades that differ in hypothesized vector or host species. This could suggest that non-Clade IV paralogs play a more important role in host and/or vector adaptation than previously expected, which would require future lab-based studies to validate.
Keyphrases
- genome wide
- genome wide identification
- copy number
- dna methylation
- genetic diversity
- high resolution
- genome wide analysis
- transcription factor
- high density
- gene expression
- machine learning
- electronic health record
- amino acid
- deep learning
- artificial intelligence
- mass spectrometry
- single cell
- neural network
- health information