Nonproportional hazards and unobserved heterogeneity in clustered survival data: When can we tell the difference?
Theodor Adrian BalanHein PutterPublished in: Statistics in medicine (2019)
Multivariate survival data are frequently encountered in biomedical applications in the form of clustered failures (or recurrent events data). A popular way of analyzing such data is by using shared frailty models, which assume that the proportional hazards assumption holds conditional on an unobserved cluster-specific random effect. Such models are often incorporated in more complicated joint models in survival analysis. If the random effect distribution has finite expectation, then the conditional proportional hazards assumption does not carry over to the marginal models. It has been shown that, for univariate data, this makes it impossible to distinguish between the presence of unobserved heterogeneity (eg, due to missing covariates) and marginal nonproportional hazards. We show that time-dependent covariate effects may falsely appear as evidence in favor of a frailty model also in the case of clustered failures or recurrent events data, when the cluster size or number of recurrent events is small. When true unobserved heterogeneity is present, the presence of nonproportional hazards leads to overestimating the frailty effect. We show that this phenomenon is somewhat mitigated as the cluster size grows. We carry out a simulation study to assess the behavior of test statistics and estimators for frailty models in such contexts. The gamma, inverse Gaussian, and positive stable shared frailty models are contrasted using a novel software implementation for estimating semiparametric shared frailty models. Two main questions are addressed in the contexts of clustered failures and recurrent events: whether covariates with a time-dependent effect may appear as indication of unobserved heterogeneity and whether the additional presence of unobserved heterogeneity can be detected in this case. Finally, the practical implications are illustrated in a real-world data analysis example.