Limits and convergence properties of the sequentially Markovian coalescent.
Thibaut Paul Patrick SellingerDiala Abu-AwadAurélien TellierPublished in: Molecular ecology resources (2021)
Several methods based on the sequentially Markovian coalescent (SMC) make use of full genome sequence data from samples to infer population demographic history including past changes in population size, admixture, migration events and population structure. More recently, the original theoretical framework has been extended to allow the simultaneous estimation of population size changes along with other life history traits such as selfing or seed banking. The latter developments enhance the applicability of SMC methods to nonmodel species. Although convergence proofs have been given using simulated data in a few specific cases, an in-depth investigation of the limitations of SMC methods is lacking. In order to explore such limits, we first develop a tool inferring the best case convergence of SMC methods assuming the true underlying coalescent genealogies are known. This tool can be used to quantify the amount and type of information that can be confidently retrieved from given data sets prior to the analysis of the real data. Second, we assess the inference accuracy when the assumptions of SMC approaches are violated due to departures from the model, namely the presence of transposable elements, variable recombination and mutation rates along the sequence, and SNP calling errors. Third, we deliver a new interpretation of SMC methods by highlighting the importance of the transition matrix, which we argue can be used as a set of summary statistics in other statistical inference methods, uncoupling the SMC from hidden Markov models (HMMs). We finally offer recommendations to better apply SMC methods and build adequate data sets under budget constraints.