Nonparametric Bayesian Q-learning for optimization of dynamic treatment regimes in the presence of partial compliance.

Indrabati Bhattacharya Ashkan Ertefaie Kevin G LynchJames R McKayBrent A Johnson

Published in: Statistical methods in medical research (2023)

Existing methods for estimation of dynamic treatment regimes are mostly limited to intention-to-treat analyses-which estimate the effect of randomization to a particular treatment regime without considering the compliance behavior of patients. In this article, we propose a novel nonparametric Bayesian Q-learning approach to construct optimal sequential treatment regimes that adjust for partial compliance. We consider the popular potential compliance framework, where some potential compliances are latent and need to be imputed. The key challenge is learning the joint distribution of the potential compliances, which we accomplish using a Dirichlet process mixture model. Our approach provides two kinds of treatment regimes: (1) conditional regimes that depend on the potential compliance values; and (2) marginal regimes where the potential compliances are marginalized. Extensive simulation studies highlight the usefulness of our method compared to intention-to-treat analyses. We apply our method to the Adaptive Treatment for Alcohol and Cocaine Dependence (ENGAGE) study , where the goal is to construct optimal treatment regimes to engage patients in therapy.

Keyphrases