Login / Signup

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Robert B OlsenLarry L OrrStephen H BellElizabeth PetragliaElena Badillo-GoicoecheaAtsushi MiyaokaElizabeth A Stuart
Published in: Journal of research on educational effectiveness (2023)
Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods-lasso regression and Bayesian Additive Regression Trees (BART)-using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced "less inaccurate" predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modelling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.
Keyphrases
  • randomized controlled trial
  • healthcare
  • high resolution
  • mental health
  • electronic health record
  • machine learning
  • big data
  • mass spectrometry
  • artificial intelligence