Login / Signup

Bayesian estimation of community size and overlap from random subsamples.

Erik K JohnsonDaniel B Larremore
Published in: PLoS computational biology (2022)
Counting the number of species, items, or genes that are shared between two groups, sets, or communities is a simple calculation when sampling is complete. However, when only partial samples are available, quantifying the overlap between two communities becomes an estimation problem. Furthermore, to calculate normalized measures of β-diversity, such as the Jaccard and Sorenson-Dice indices, one must also estimate the total sizes of the communities being compared. Previous efforts to address these problems have assumed knowledge of total community sizes and then used Bayesian methods to produce unbiased estimates with quantified uncertainty. Here, we address communities of unknown size and show that this produces systematically better estimates-both in terms of central estimates and quantification of uncertainty in those estimates. We further show how to use species, item, or gene count data to refine estimates of community size in a Bayesian joint model of community size and overlap.
Keyphrases
  • mental health
  • healthcare
  • electronic health record
  • quality improvement
  • genome wide analysis