Average nucleotide diversity should be weighted by per-site sample size.
Kieran SamukPublished in: Molecular ecology resources (2022)
Konopiński (2022) suggests that when averaging nucleotide diversity over a sequence, ignoring per-site sample size variation (i.e., using an unweighted mean) offers an improvement in precision (lower variation) and accuracy (reduced bias). Here, I argue that preserving uncertainty due to variation in sample size is in line with best statistical practices, and that the increase in accuracy observed is not a general feature of the unweighted mean proposed by Konopiński (2022). As such, I conclude that the use of a weighted mean, as employed by (Korunes & Samuk, 2020), remains the preferred method for averaging nucleotide diversity over multiple sites.