Login / Signup

Parameter Identifiability for a Profile Mixture Model of Protein Evolution.

Samaneh YourdkhaniElizabeth S AllmanJohn A Rhodes
Published in: Journal of computational biology : a journal of computational molecular cell biology (2021)
A profile mixture (PM) model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend, in part, on different amino acid distributions, or profiles, varying over sites in aligned sequences. A fundamental question for any stochastic model, which must be answered positively to justify model-based inference, is whether the parameters are identifiable from the probability distribution they determine. Here, using algebraic methods, we show that a PM model has identifiable parameters under circumstances in which it is likely to be used for empirical analyses. In particular, for a tree relating 9 or more taxa, both the tree topology and all numerical parameters are generically identifiable when the number of profiles is less than 74.
Keyphrases
  • amino acid
  • air pollution
  • dna methylation
  • machine learning
  • small molecule
  • heavy metals
  • single cell
  • polycyclic aromatic hydrocarbons
  • protein protein
  • deep learning
  • big data
  • genetic diversity