Login / Signup

Interactions of scores derived from two groups of variables: Alternating lasso regularization avoids overfitting and finds interpretable scores.

Philipp DoeblerAnna DoeblerPhilip BuczakAndreas Groll
Published in: Psychological methods (2022)
Regression models with interaction terms are common models for moderating relationships. When effects of several predictors from one group-for example, genetic variables-are potentially moderated by several predictors from another-for example, environmental variables-many interaction terms result. This complicates model interpretation, especially when coefficient signs point in different directions. By first forming a score for each group of predictors, the interaction model's dimension is severely reduced. The hierarchical score model is an elegant one-step approach: Score weights and regression model coefficients are estimated simultaneously by an alternating optimization (AO) algorithm. Especially in high dimensional settings, scores remain an effective technique to reduce interaction model dimension, and we propose regularization to ensure sparsity and interpretability of the score weights. A nontrivial extension of the original AO algorithm is presented, which adds a lasso penalty, resulting in the alternating lasso optimization algorithm (ALOA). The hierarchical score model with ALOA is an interpretable statistical learning technique for moderation in potentially high dimensional applications, and encompasses generalized linear models for the main interaction model. In addition to the lasso regularization, a screening procedure called regularization and residualization (RR) is proposed to avoid spurious interactions. ALOA tuning parameter choice and the RR screening procedure are investigated by simulations, and two illustrative applications to depression risk are provided. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Keyphrases
  • machine learning
  • deep learning
  • magnetic resonance
  • emergency department
  • minimally invasive
  • gene expression
  • risk assessment
  • genome wide
  • single molecule
  • electronic health record