Login / Signup

Synthesizing external aggregated information in the penalized Cox regression under population heterogeneity.

Ying ShengYifei SunChiung-Yu HuangGrace E Kim
Published in: Statistics in medicine (2021)
Synthesizing external aggregated information has been proven useful in improving estimation efficiency when conducting statistical analysis using a limited amount of data. In this paper, we develop a unified framework for combining information from high-dimensional individual-level data and potentially low-dimensional external aggregate data under the Cox model. We summarize various forms of external aggregated information by population estimating equations and propose a penalized empirical likelihood approach to borrow information from these estimating equations. The proposed methods possess the flexibility to handle the case where individual-level data and external aggregate data are from heterogeneous populations. Specifically, a penalized empirical likelihood ratio test is developed to check for the potential heterogeneity, and a semiparametric density ratio model is postulated to account for the heterogeneity. Moreover, we study the impact of uncertainty in the auxiliary information on the efficiency gain and propose a modified variance estimator to adjust for the uncertainty. The proposed estimators enjoy the oracle property and are asymptotically more efficient than the penalized partial likelihood estimator that does not exploit the external aggregated information. Simulation studies show improvement in both estimation efficiency and variable selection over the competitors. The proposed approaches are applied to the analysis of a pediatric kidney transplant study for illustration.
Keyphrases
  • electronic health record
  • health information
  • big data
  • single cell
  • machine learning