zol & fai: large-scale targeted detection and evolutionary investigation of gene clusters.
Rauf SalamzadePatricia Q TranCody MartinAbigail L MansonMichael S GilmoreAshlee M EarlKarthik AnantharamanLindsay R KalanPublished in: bioRxiv : the preprint server for biology (2023)
Many universally and conditionally important genes are genomically aggregated within clusters. Here, we introduce fai and zol, which together enable large-scale comparative analysis of different types of gene clusters and mobile-genetic elements (MGEs), such as biosynthetic gene clusters (BGCs) or viruses. Fundamentally, they overcome a current bottleneck to reliably perform comprehensive orthology inference at large scale across broad taxonomic contexts and thousands of genomes. First, fai allows the identification of orthologous or homologous instances of a query gene cluster of interest amongst a database of target genomes. Subsequently, zol enables reliable, context-specific inference of protein-encoding ortholog groups for individual genes across gene cluster instances. In addition, zol performs functional annotation and computes a variety of statistics for each inferred ortholog group. These programs are showcased through application to: (i) longitudinal tracking of a virus in metagenomes, (ii) discovering novel population-genetic insights of two common BGCs in a fungal species, and (iii) uncovering large-scale evolutionary trends of a virulence-associated gene cluster across thousands of genomes from a diverse bacterial genus.