Login / Signup

Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome.

Lingyun LyuYu ChengAbdus S Wahed
Published in: Biometrics (2023)
Q-learning has been one of the most commonly used methods for optimizing dynamic treatment regimes (DTRs) in multi-stage decision making. Right-censored survival outcome poses a significant challenge to Q-Learning due to its reliance on parametric models for counterfactual estimation which are subject to misspecification and sensitive to missing covariates. In this paper we propose an imputation-based Q-learning (IQ-learning) where flexible nonparametric or semiparametric models are employed to estimate optimal treatment rules for each stage and then weighted hot-deck multiple imputation (MI) and direct-draw MI are used to predict optimal potential survival times. Missing data are handled using inverse probability weighting and MI, and the non-random treatment assignment among the observed is accounted for using a propensity-score approach. We investigate the performance of IQ-learning via extensive simulations and show that it is more robust to model misspecification than existing Q-Learning methods, imputes only plausible potential survival times contrary to parametric models, and provides more flexibility in terms of baseline hazard shape. Using IQ-learning we developed an optimal DTR for leukemia treatment based on a randomized trial with observational follow-up that motivated this study. This article is protected by copyright. All rights reserved.
Keyphrases
  • decision making
  • combination therapy
  • bone marrow
  • magnetic resonance
  • acute myeloid leukemia
  • molecular dynamics
  • cross sectional
  • big data