Evidence That Selecting an Appropriate Item Response Theory-Based Approach to Scoring Surveys Can Help Avoid Biased Treatment Effect Estimates.

Published in: Educational and psychological measurement (2021)

Considerable thought is often put into designing randomized control trials (RCTs). From power analyses and complex sampling designs implemented preintervention to nuanced quasi-experimental models used to estimate treatment effects postintervention, RCT design can be quite complicated. Yet when psychological constructs measured using survey scales are the outcome of interest, measurement is often an afterthought, even in RCTs. The purpose of this study is to examine how choices about scoring and calibration of survey item responses affect recovery of true treatment effects. Specifically, simulation and empirical studies are used to compare the performance of sum scores, which are frequently used in RCTs in psychology and education, to that of approaches rooted in item response theory (IRT) that better account for the longitudinal, multigroup nature of the data. The results from this study indicate that selecting an IRT model that matches the nature of the data can significantly reduce bias in treatment effect estimates and reduce standard errors.

Keyphrases