Login / Signup

Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome.

Lingyun LyuYu ChengAbdus S Wahed
Published in: Biometrics (2023)
Q-learning has been one of the most commonly used methods for optimizing dynamic treatment regimes (DTRs) in multistage decision-making. Right-censored survival outcome poses a significant challenge to Q-Learning due to its reliance on parametric models for counterfactual estimation which are subject to misspecification and sensitive to missing covariates. In this paper, we propose an imputation-based Q-learning (IQ-learning) where flexible nonparametric or semiparametric models are employed to estimate optimal treatment rules for each stage and then weighted hot-deck multiple imputation (MI) and direct-draw MI are used to predict optimal potential survival times. Missing data are handled using inverse probability weighting and MI, and the nonrandom treatment assignment among the observed is accounted for using a propensity-score approach. We investigate the performance of IQ-learning via extensive simulations and show that it is more robust to model misspecification than existing Q-Learning methods, imputes only plausible potential survival times contrary to parametric models and provides more flexibility in terms of baseline hazard shape. Using IQ-learning, we developed an optimal DTR for leukemia treatment based on a randomized trial with observational follow-up that motivated this study.
Keyphrases
  • combination therapy
  • magnetic resonance imaging
  • acute myeloid leukemia
  • cross sectional
  • data analysis