Login / Signup

On measuring agreement with numerically bounded linguistic probability schemes: A re-analysis of data from Wintle, Fraser, Wills, Nicholson, and Fidler (2019).

David R MandelDaniel Irwin
Published in: PloS one (2021)
Across a wide range of domains, experts make probabilistic judgments under conditions of uncertainty to support decision-making. These judgments are often conveyed using linguistic expressions (e.g., x is likely). Seeking to foster shared understanding of these expressions between senders and receivers, the US intelligence community implemented a communication standard that prescribes a set of probability terms and assigns each term an equivalent numerical probability range. In an earlier PLOS ONE article, [1] tested whether access to the standard improves shared understanding and also explored the efficacy of various enhanced presentation formats. Notably, they found that embedding numeric equivalents in text (e.g., x is likely [55-80%]) substantially outperformed the status-quo approach in terms of the percentage overlap between participants' interpretations of linguistic probabilities (defined in terms of the numeric range equivalents they provided for each term) and the numeric ranges in the standard. These results have important prescriptive implications, yet Wintle et al.'s percentage overlap measure of agreement may be viewed as unfairly punitive because it penalizes individuals for being more precise than the stipulated guidelines even when the individuals' interpretations fall perfectly within the stipulated ranges. Arguably, subjects' within-range precision is a positive attribute and should not be penalized in scoring interpretive agreement. Accordingly, in the present article, we reanalyzed Wintle et al.'s data using an alternative measure of percentage overlap that does not penalize in-range precision. Using the alternative measure, we find that percentage overlap is substantially elevated across conditions. More importantly, however, the effects of presentation format and probability level are highly consistent with the original study. By removing the ambiguity caused by Wintle et al.'s unduly punitive measure of agreement, these findings buttress Wintle et al.'s original claim that the methods currently used by intelligence organizations are ineffective at coordinating the meaning of uncertainty expressions between intelligence producers and intelligence consumers. Future studies examining agreement between senders and receivers are also encouraged to reflect carefully on the most appropriate measures of agreement to employ in their experiments and to explicate the bases for their methodological choices.
Keyphrases
  • mental health
  • decision making
  • preterm infants
  • healthcare
  • electronic health record
  • big data
  • case report
  • gestational age
  • palliative care
  • machine learning
  • data analysis
  • current status
  • case control