Modeling count data in the addiction field: Some simple recommendations.

Stéphanie Baggio Katia Iglesias Valentin Rousson

Published in: International journal of methods in psychiatric research (2017)

Analyzing count data is frequent in addiction studies but may be cumbersome, time-consuming, and cause misleading inference if models are not correctly specified. We compared different statistical models in a simulation study to provide simple, yet valid, recommendations when analyzing count data.We used 2 simulation studies to test the performance of 7 statistical models (classical or quasi-Poisson regression, classical or zero-inflated negative binomial regression, classical or heteroskedasticity-consistent linear regression, and Mann-Whitney test) for predicting the differences between population means for 9 different population distributions (Poisson, negative binomial, zero- and one-inflated Poisson and negative binomial, uniform, left-skewed, and bimodal). We considered a large number of scenarios likely to occur in addiction research: presence of outliers, unbalanced design, and the presence of confounding factors. In unadjusted models, the Mann-Whitney test was the best model, followed closely by the heteroskedasticity-consistent linear regression and quasi-Poisson regression. Poisson regression was by far the worst model. In adjusted models, quasi-Poisson regression was the best model. If the goal is to compare 2 groups with respect to count data, a simple recommendation would be to use quasi-Poisson regression, which was the most generally valid model in our extensive simulations.

Keyphrases