Login / Signup

Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny.

Robert C Edgar
Published in: Nature communications (2022)
Multiple sequence alignments are widely used to infer evolutionary relationships, enabling inferences of structure, function, and phylogeny. Standard practice is to construct one alignment by some preferred method and use it in further analysis; however, undetected alignment bias can be problematic. I describe Muscle5, a novel algorithm which constructs an ensemble of high-accuracy alignment with diverse biases by perturbing a hidden Markov model and permuting its guide tree. Confidence in an inference is assessed as the fraction of the ensemble which supports it. Applied to phylogenetic tree estimation, I show that ensembles can confidently resolve topologies with low bootstrap according to standard methods, and conversely that some topologies with high bootstraps are incorrect. Applied to the phylogeny of RNA viruses, ensemble analysis shows that recently adopted taxonomic phyla are probably polyphyletic. Ensemble analysis can improve confidence assessment in any inference from an alignment.
Keyphrases
  • healthcare
  • primary care
  • skeletal muscle
  • machine learning
  • neural network
  • single cell
  • gene expression
  • deep learning
  • dna methylation
  • amino acid