Login / Signup

Integrated partially linear model for multi-center studies with heterogeneity and batch effect in covariates.

Lei YangYongzhao Shao
Published in: Statistics (2023)
The design of multi-center study is increasingly used for borrowing strength from multiple research groups to obtain broadly applicable and reproducible study findings. Regression analysis is widely used for analyzing multi-group studies, however, some of the large number of regression predictors are nonlinear and/or often measured with batch effects in many large scale collaborative studies. Also, the group compositions of the nonlinear predictors are potentially heterogeneous across different centers. The conventional pooled data analysis ignores the interplay between nonlinearity and batch effect, group composition heterogeneity, measurement error and other data incoherence in multi-center setting that can cause biased regression estimates and misleading outcomes. In this paper, we propose an integrated partially linear regression model (IPLM) based analysis to account for the predictor's nonlinearity, general batch effect, group composition heterogeneity, high-dimensional covariates, potential measurement-error in covariates, and combinations of these complexities simultaneously. A local linear regression based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects that can be either homogeneous or heterogeneous across groups. In particular, when the effects of all predictors are homogeneous across the study centers, the proposed IPLM can automatically reduce to one single parsimonious partially linear model for all centers. The proposed method has asymptotic estimation and variable selection consistency including high-dimensional covariates. Moreover, it has a fast computing algorithm and its effectiveness is supported by numerical simulation studies. A multi-center Alzheimer's disease research project is provided to illustrate the proposed IPLM based analysis.
Keyphrases
  • data analysis
  • randomized controlled trial
  • single cell
  • clinical trial
  • quality improvement
  • risk assessment
  • adipose tissue
  • climate change
  • phase iii