The effect of misspecification of random effects distributions in clustered data settings with outcome-dependent sampling.
John M NeuhausCharles E McCullochPublished in: The Canadian journal of statistics = Revue canadienne de statistique (2011)
Genetic epidemiologists often gather outcome-dependent samples of family data to measure within-family associations of genetic factors with disease outcomes. Generalized linear mixed models provide effective methods to estimate within-family associations but typically require parametric specification of the random effects distribution. Although misspecification of the random effects distribution often leads to little bias in estimated regression coefficients in standard, prospective clustered data settings, some recent studies suggest that such misspecification will impact parameter estimates from outcome-dependent cluster sampling designs. Using analytic results, simulation studies and fits to example data, this study examines the effect of misspecification of random effects distributions on parameter estimates in clustered data settings with outcome-dependent sampling. We show that the effects are consistent with results from prospective cluster sampling settings. In particular, ascertainment corrected mixed model methods that assume normally distributed random intercepts and conditional likelihood approaches provide accurate estimates of within-family covariate effects even under a misspecified random effects distribution.