Login / Signup

Robust data integration from multiple external sources for generalized linear models with binary outcomes.

Kyuseong ChoiJeremy M G TaylorPeisong Han
Published in: Biometrics (2024)
We aim to estimate parameters in a generalized linear model (GLM) for a binary outcome when, in addition to the raw data from the internal study, more than 1 external study provides summary information in the form of parameter estimates from fitting GLMs with varying subsets of the internal study covariates. We propose an adaptive penalization method that exploits the external summary information and gains efficiency for estimation, and that is both robust and computationally efficient. The robust property comes from exploiting the relationship between parameters of a GLM and parameters of a GLM with omitted covariates and from downweighting external summary information that is less compatible with the internal data through a penalization. The computational burden associated with searching for the optimal tuning parameter for the penalization is reduced by using adaptive weights and by using an information criterion when searching for the optimal tuning parameter. Simulation studies show that the proposed estimator is robust against various types of population distribution heterogeneity and also gains efficiency compared to direct maximum likelihood estimation. The method is applied to improve a logistic regression model that predicts high-grade prostate cancer making use of parameter estimates from 2 external models.
Keyphrases
  • prostate cancer
  • high grade
  • electronic health record
  • healthcare
  • adipose tissue
  • machine learning
  • radical prostatectomy
  • skeletal muscle
  • single cell
  • case control