Average nucleotide diversity should be weighted by per-site sample size.

Published in: Molecular ecology resources (2022)

Konopiński (2022) suggests that when averaging nucleotide diversity over a sequence, ignoring per-site sample size variation (i.e., using an unweighted mean) offers an improvement in precision (lower variation) and accuracy (reduced bias). Here, I argue that preserving uncertainty due to variation in sample size is in line with best statistical practices, and that the increase in accuracy observed is not a general feature of the unweighted mean proposed by Konopiński (2022). As such, I conclude that the use of a weighted mean, as employed by (Korunes & Samuk, 2020), remains the preferred method for averaging nucleotide diversity over multiple sites.

Keyphrases

magnetic resonance
primary care
healthcare
contrast enhanced
machine learning
network analysis
magnetic resonance imaging
deep learning
computed tomography
amino acid