Adjusting for differential misclassification in matched case-control studies utilizing health administrative data.

Tanja HöggYinshan ZhaoJuxin LiuJohn PetkauJohn FiskRuth Ann MarrieHelen Tremlett

Published in: Statistics in medicine (2019)

In epidemiological studies of secondary data sources, lack of accurate disease classifications often requires investigators to rely on diagnostic codes generated by physicians or hospital systems to identify case and control groups, resulting in a less-than-perfect assessment of the disease under investigation. Moreover, because of differences in coding practices by physicians, it is hard to determine the factors that affect the chance of an incorrectly assigned disease status. What results is a dilemma where assumptions of non-differential misclassification are questionable but, at the same time, necessary to proceed with statistical analyses. This paper develops an approach to adjust exposure-disease association estimates for disease misclassification, without the need of simplifying non-differentiality assumptions, or prior information about a complicated classification mechanism. We propose to leverage rich temporal information on disease-specific healthcare utilization to estimate each participant's probability of being a true case and to use these estimates as weights in a Bayesian analysis of matched case-control data. The approach is applied to data from a recent observational study into the early symptoms of multiple sclerosis (MS), where MS cases were identified from Canadian health administrative databases and matched to population controls that are assumed to be correctly classified. A comparison of our results with those from non-differentially adjusted analyses reveals conflicting inferences and highlights that ill-suited assumptions of non-differential misclassification can exacerbate biases in association estimates.

Keyphrases