Inference of Infectious Disease Transmission through a Relaxed Bottleneck Using Multiple Genomes Per Host.
Jake CarsonMatt KeelingDavid WylliePaolo RibecaXavier DidelotPublished in: Molecular biology and evolution (2024)
In recent times, pathogen genome sequencing has become increasingly used to investigate infectious disease outbreaks. When genomic data is sampled densely enough amongst infected individuals, it can help resolve who infected whom. However, transmission analysis cannot rely solely on a phylogeny of the genomes but must account for the within-host evolution of the pathogen, which blurs the relationship between phylogenetic and transmission trees. When only a single genome is sampled for each host, the uncertainty about who infected whom can be quite high. Consequently, transmission analysis based on multiple genomes of the same pathogen per host has a clear potential for delivering more precise results, even though it is more laborious to achieve. Here, we present a new methodology that can use any number of genomes sampled from a set of individuals to reconstruct their transmission network. Furthermore, we remove the need for the assumption of a complete transmission bottleneck. We use simulated data to show that our method becomes more accurate as more genomes per host are provided, and that it can infer key infectious disease parameters such as the size of the transmission bottleneck, within-host growth rate, basic reproduction number, and sampling fraction. We demonstrate the usefulness of our method in applications to real datasets from an outbreak of Pseudomonas aeruginosa amongst cystic fibrosis patients and a nosocomial outbreak of Klebsiella pneumoniae.