Login / Signup

QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent.

Yasamin TabatabaeeSebastien RochSebastien Roch
Published in: Journal of computational biology : a journal of computational molecular cell biology (2023)
We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on "short quintets" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.
Keyphrases
  • copy number
  • genome wide
  • machine learning
  • gene expression
  • deep learning
  • genome wide identification
  • genetic diversity