Pursuing sources of heterogeneity in modeling clustered population.
Yan LiChun YuYize ZhaoWeixin YaoRobert H AseltineKun ChenPublished in: Biometrics (2021)
Researchers often have to deal with heterogeneous population with mixed regression relationships, increasingly so in the era of data explosion. In such problems, when there are many candidate predictors, it is not only of interest to identify the predictors that are associated with the outcome, but also to distinguish the true sources of heterogeneity, that is, to identify the predictors that have different effects among the clusters and thus are the true contributors to the formation of the clusters. We clarify the concepts of the source of heterogeneity that account for potential scale differences of the clusters and propose a regularized finite mixture effects regression to achieve heterogeneity pursuit and feature selection simultaneously. We develop an efficient algorithm and show that our approach can achieve both estimation and selection consistency. Simulation studies further demonstrate the effectiveness of our method under various practical scenarios. Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain neuroimaging traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.
Keyphrases
- public health
- single cell
- randomized controlled trial
- machine learning
- mental health
- systematic review
- physical activity
- big data
- deep learning
- climate change
- high resolution
- risk assessment
- multiple sclerosis
- gene expression
- cognitive decline
- blood brain barrier
- white matter
- artificial intelligence
- cerebral ischemia
- fluorescence imaging