Login / Signup

Incorporating survival data into case-control studies with incident and prevalent cases.

Soutrik MandalJing QinRuth M Pfeiffer
Published in: Statistics in medicine (2021)
Typically, case-control studies to estimate odds-ratios associating risk factors with disease incidence only include newly diagnosed cases. Recently proposed methods allow incorporating information on prevalent cases, individuals who survived from disease diagnosis to sampling, into cross-sectionally sampled case-control studies under parametric assumptions for the survival time after diagnosis. Here we propose and study methods to additionally use prospectively observed survival times from prevalent and incident cases to adjust logistic models for the time between diagnosis and sampling, the backward time, for prevalent cases. This adjustment yields unbiased odds-ratio estimates from case-control studies that include prevalent cases. We propose a computationally simple two-step generalized method-of-moments estimation procedure. First, we estimate the survival distribution assuming a semiparametric Cox model using an expectation-maximization algorithm that yields fully efficient estimates and accommodates left truncation for prevalent cases and right censoring. Then, we use the estimated survival distribution in an extension of the logistic model to three groups (controls, incident, and prevalent cases), to adjust for the survival bias in prevalent cases. In simulations, under modest amounts of censoring, odds-ratios from the two-step procedure were equally efficient as those estimated from a joint logistic and survival data likelihood under parametric assumptions. This indicates that utilizing the cases' prospective survival data lessens model dependencies and improves precision of association estimates for case-control studies with prevalent cases. We illustrate the methods by estimating associations between single nucleotide polymorphisms and breast cancer risk using controls, and incident and prevalent cases sampled from the US Radiologic Technologists Study cohort.
Keyphrases
  • case control
  • risk factors
  • cardiovascular disease
  • free survival
  • healthcare
  • machine learning
  • big data