Outcome trajectory estimation for optimal dynamic treatment regimes with repeated measures.
Yuan ZhangDavid M VockMegan E PatrickLizbeth H FinestackThomas A MurrayPublished in: Journal of the Royal Statistical Society. Series C, Applied statistics (2023)
In recent sequential multiple assignment randomized trials, outcomes were assessed multiple times to evaluate longer-term impacts of the dynamic treatment regimes (DTRs). Q-learning requires a scalar response to identify the optimal DTR. Inverse probability weighting may be used to estimate the optimal outcome trajectory, but it is inefficient, susceptible to model mis-specification, and unable to characterize how treatment effects manifest over time. We propose modified Q-learning with generalized estimating equations to address these limitations and apply it to the M-bridge trial, which evaluates adaptive interventions to prevent problematic drinking among college freshmen. Simulation studies demonstrate our proposed method improves efficiency and robustness.