Naturalgwas: An R package for evaluating genomewide association methods with empirical data.
Olivier FrançoisKevin CayePublished in: Molecular ecology resources (2018)
Association studies of polygenic traits are notoriously difficult when those studies are conducted at large geographic scales. The difficulty arises as genotype frequencies often vary in geographic space and across distinct environments. Those large-scale variations are known to yield false positives in standard association testing approaches. Although several methods alleviate this problem, no tools have been proposed to evaluate the power that association tests could achieve for a specific study design and set of genotypes. Our goal here is to present an R program fulfilling this objective, by allowing users to simulate phenotypes from observed genotypes and to estimate upper bounds on achievable power. The simulation model can incorporate realistic features such as population structure and gene-by-environment interactions, and the package implements a gold-standard test that evaluates power using information on confounders. We illustrated the use of the program with example studies based on data for the plant species Arabidopsis thaliana. Simulated phenotypes were used to compare the ability of two recent association methods to correctly remove confounding factors, to evaluate power to detect causal variants, and to assess the influence various parameters. For the simulated data, the new tests reached performances close to the gold-standard test and could be reasonably used with measured phenotypes. Power to detect causal variants was influenced by the number of variants and by the strength of their effect sizes, and specific thresholds were obtained from the simulation study. In conclusion, our program provides guidance on methodological choice of association tests, as well as useful knowledge on test performances in a user-specific context.