Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability.
Toni LangeChristian KopkowJörg LütznerKlaus-Peter GüntherSascha GraviusHanns-Peter ScharfJohannes StöveRichard WagnerJochen SchmittPublished in: BMC medical research methodology (2020)
This study provides evidence that consensus depends on the rating scale and consensus threshold within one population. The test-retest reliability of the three rating scales investigated differs substantially between individual treatment goals. This variation in reliability can become a potential source of bias in consensus studies. In our setting aimed at capturing patients' treatment goals for TKA, the three-point scale proves to be the most reasonable choice, as its translation into the clinical context is the most straightforward among the scales. Researchers conducting Delphi studies should be aware that final consensus is substantially influenced by the choice of rating scale and consensus criteria.