Optimising precision and power by machine learning in randomised trials with ordinal and time-to-event outcomes with an application to COVID-19.
Nicholas WilliamsMichael D RosenblumIvan DiazPublished in: Journal of the Royal Statistical Society. Series A, (Statistics in Society) (2022)
The rapid finding of effective therapeutics requires efficient use of available resources in clinical trials. Covariate adjustment can yield statistical estimates with improved precision, resulting in a reduction in the number of participants required to draw futility or efficacy conclusions. We focus on time-to-event and ordinal outcomes. When more than a few baseline covariates are available, a key question for covariate adjustment in randomised studies is how to fit a model relating the outcome and the baseline covariates to maximise precision. We present a novel theoretical result establishing conditions for asymptotic normality of a variety of covariate-adjusted estimators that rely on machine learning (e.g., ℓ 1 -regularisation, Random Forests, XGBoost, and Multivariate Adaptive Regression Splines [MARS]), under the assumption that outcome data are missing completely at random. We further present a consistent estimator of the asymptotic variance. Importantly, the conditions do not require the machine learning methods to converge to the true outcome distribution conditional on baseline variables, as long as they converge to some (possibly incorrect) limit. We conducted a simulation study to evaluate the performance of the aforementioned prediction methods in COVID-19 trials. Our simulation is based on resampling longitudinal data from over 1500 patients hospitalised with COVID-19 at Weill Cornell Medicine New York Presbyterian Hospital. We found that using ℓ 1 -regularisation led to estimators and corresponding hypothesis tests that control type 1 error and are more precise than an unadjusted estimator across all sample sizes tested. We also show that when covariates are not prognostic of the outcome, ℓ 1 -regularisation remains as precise as the unadjusted estimator, even at small sample sizes ( n = 100 ). We give an R package adjrct that performs model-robust covariate adjustment for ordinal and time-to-event outcomes.
Keyphrases
- machine learning
- clinical trial
- coronavirus disease
- sars cov
- big data
- artificial intelligence
- open label
- end stage renal disease
- newly diagnosed
- ejection fraction
- study protocol
- randomized controlled trial
- climate change
- deep learning
- chronic kidney disease
- small molecule
- respiratory syndrome coronavirus
- double blind
- data analysis
- emergency department
- phase ii
- adipose tissue
- placebo controlled
- phase iii
- quantum dots
- skeletal muscle
- peritoneal dialysis
- loop mediated isothermal amplification