Login / Signup

Permutation tests are robust and powerful at 0.5% and 5% significance levels.

Kimihiro NoguchiFrank KonietschkeFernando Marmolejo-RamosMarkus Pauly
Published in: Behavior research methods (2021)
Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson (Proceedings of the National Academy of Sciences, 110, 19313-19317, 2013) and Benjamin et al. (Nature Human Behaviour, 2, 6-10 2018) recommend using the significance level of α = 0.005 (0.5%) as opposed to the conventional 0.05 (5%) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t-test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t-distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.
Keyphrases
  • endothelial cells
  • public health
  • electronic health record
  • quality improvement
  • big data
  • induced pluripotent stem cells
  • artificial intelligence