Login / Signup

Methods of assessing categorical agreement between correlated screening tests in clinical studies.

Thomas J ZhouSughra RazaKerrie P Nelson
Published in: Journal of applied statistics (2020)
Advances in breast imaging and other screening tests have prompted studies to evaluate and compare the consistency between experts' ratings of existing with new screening tests. In clinical settings, medical experts make subjective assessments of screening test results such as mammograms. Consistency between experts' ratings is evaluated by measures of inter-rater agreement or association. However, conventional measures, such as Cohen's and Fleiss' kappas, are unable to be applied or may perform poorly when studies consist of many experts, unbalanced data, or dependencies between experts' ratings exist. Here we assess the performance of existing approaches including recently developed summary measures for assessing the agreement between experts' binary and ordinal ratings when patients undergo two screening procedures. Methods to assess consistency between repeated measurements by the same experts are also described. We present applications to three large-scale clinical screening studies. Properties of these agreement measures are illustrated via simulation studies. Generally, a model-based approach provides several advantages over alternative methods including the ability to flexibly incorporate various measurement scales (i.e. binary or ordinal), large numbers of experts and patients, sparse data, and robustness to prevalence of underlying disease.
Keyphrases
  • end stage renal disease
  • ejection fraction
  • newly diagnosed
  • healthcare
  • prognostic factors
  • chronic kidney disease
  • case control
  • high resolution
  • peritoneal dialysis
  • machine learning
  • ionic liquid