Login / Signup

Filtration of Gene Trees from 9000 Exons, Introns and UCEs Disentangle Conflicting Phylogenomic Relationships in Tree Frogs (Hylidae).

Carl R HutterWilliam Duellman
Published in: Genome biology and evolution (2023)
An emerging challenge in interpreting phylogenomic datasets is that concatenation and multi-species coalescent summary species tree approaches may produce conflicting results. Concatenation is problematic because it can strongly support an incorrect topology when Incomplete Lineage Sorting (ILS) results in elevated gene-tree discordance. Conversely, summary species tree methods account for ILS to recover the correct topology, but these methods do not account for erroneous gene trees ("EGTs") resulting from gene tree estimation error (GTEE). Third, site-based and full-likelihood methods promise to alleviate GTEE as these methods use the sequence data from alignments. To understand the impact of GTEE on species tree estimation in Hylidae tree frogs, we use an expansive dataset of ∼9000 exons, introns, and UCEs and initially found conflict between all three types of analytical methods. We filtered EGTs using alignment metrics that could lead to GTEE (length, parsimony-informative sites, missing data) and found that removing shorter, less informative alignments reconciled the conflict between concatenation and summary species tree methods with increased gene concordance, with the filtered topologies matching expected results from past studies. Contrarily, site-based and full-likelihood methods were mixed where one method was consistent with past studies and the other varied markedly. Critical to other studies, these results suggest a widespread conflation of ILS and GTEE, where EGTs rather than ILS are driving discordance. Finally, we apply these recommendations to an R package named PhyloConfigR, which facilitates phylogenetic software setup, summarizes alignments, and provides tools for filtering alignments and gene trees.
Keyphrases