Phylogenomics reveals extensive misidentification of fungal strains from the genus Aspergillus .
Jacob Lucas SteenwykCharu BalamuruganHuzefa A RajaCarla GonçalvesNingxiao LiFrank MartinJudith BermanNicholas H OberlielsJohn G GibbonsGustavo Henrique GoldmanDavid M GeiserJos HoubrakenDavid S HibbettAntonis RokasPublished in: Microbiology spectrum (2024)
Modern taxonomic classification is often based on phylogenetic analyses of a few molecular markers, although single-gene studies are still common. Here, we leverage genome-scale molecular phylogenetics (phylogenomics) of species and populations to reconstruct evolutionary relationships in a dense data set of 710 fungal genomes from the biomedically and technologically important genus Aspergillus . To do so, we generated a novel set of 1,362 high-quality molecular markers specific for Aspergillus and provided profile Hidden Markov Models for each, facilitating their use by others. Examining the resulting phylogeny helped resolve ongoing taxonomic controversies, identified new ones, and revealed extensive strain misidentification (7.59% of strains were previously misidentified), underscoring the importance of population-level sampling in species classification. These findings were corroborated using the current standard, taxonomically informative loci. These findings suggest that phylogenomics of species and populations can facilitate accurate taxonomic classifications and reconstructions of the Tree of Life.IMPORTANCEIdentification of fungal species relies on the use of molecular markers. Advances in genomic technologies have made it possible to sequence the genome of any fungal strain, making it possible to use genomic data for the accurate assignment of strains to fungal species (and for the discovery of new ones). We examined the usefulness and current limitations of genomic data using a large data set of 710 publicly available genomes from multiple strains and species of the biomedically, agriculturally, and industrially important genus Aspergillus . Our evolutionary genomic analyses revealed that nearly 8% of publicly available Aspergillus genomes are misidentified. Our work highlights the usefulness of genomic data for fungal systematic biology and suggests that systematic genome sequencing of multiple strains, including reference strains (e.g., type strains), of fungal species will be required to reduce misidentification errors in public databases.