Login / Signup

Parsimony analysis of phylogenomic datasets (II): evaluation of PAUP*, MEGA and MPBoot.

Pablo A GoloboffSantiago Andrés CatalanoAmbrosio Torres
Published in: Cladistics : the international journal of the Willi Hennig Society (2021)
This paper examines the implementation of parsimony methods in the programs PAUP*, MEGA and MPBoot, and compares them with TNT. PAUP* implements standard, well-tested algorithms, and flexible search strategies and options for handling trees; its main drawback is the lack of advanced search algorithms, which makes it difficult to find most parsimonious trees for large and complex datasets. In addition, branch-swapping can be much slower than in TNT for datasets with large numbers of taxa, although this is only occasionally a problem for phylogenomic datasets given that they typically have small numbers of taxa. The parsimony implementation of MEGA has major drawbacks. MEGA often fails to find parsimonious trees because it does not perform all possible branch swapping subtree pruning regrafting (SPR)/tree bisection-reconnection (TBR) rearrangements. It furthermore fails to properly handle ambiguity or multiple equally parsimonious trees, and it uses the same addition sequence for all bootstrap replicates. The latter yields values of group support that depend on the order in which taxa are listed in the dataset. In addition, tree searches are very slow and do not facilitate the exploration of different starting points (as random seed is fixed). MPBoot searches for optimal trees using the ratchet, but it is based on SPR instead of TBR (and only evaluates by default a subset of the SPR rearrangements). MPBoot approximates bootstrap frequencies by first finding a sample of trees and then selecting from those trees for every replicate, without performing a tree-search. The approximation is too rough in many cases, producing serious under- or overestimations of the correct support values and, for most kinds of datasets, slower estimations than can be obtained with TNT. In addition, bootstrapping with PAUP*, MEGA or MPBoot can attribute strong supports to groups that have no support at all under any meaningful concept of support, such as likelihood ratios or Bremer supports. In TNT, this problem is decreased by using the strict consensus tree to represent each replicate, or eliminated entirely by using different approximations of the Bremer support.
Keyphrases
  • rna seq
  • machine learning
  • primary care
  • healthcare
  • public health
  • deep learning
  • functional connectivity
  • amino acid