Inter-Rater and Intra-Rater Agreement in Scoring Severity of Rodent Cardiomyopathy and Relation to Artificial Intelligence-Based Scoring.
Thomas J SteinbachDebra A TokarzCaroll A CoShawn F HarrisSandra J McBrideKeith R ShockleyAvinash LokhandeGargi SrivastavaRajesh UgalmugleArshad KaziEmily SingletaryMark F CestaHeath C ThomasVivian S ChenKristen HobbieTorrie A CrabbsPublished in: Toxicologic pathology (2024)
We previously developed a computer-assisted image analysis algorithm to detect and quantify the microscopic features of rodent progressive cardiomyopathy (PCM) in rat heart histologic sections and validated the results with a panel of five veterinary toxicologic pathologists using a multinomial logistic model. In this study, we assessed both the inter-rater and intra-rater agreement of the pathologists and compared pathologists' ratings to the artificial intelligence (AI)-predicted scores. Pathologists and the AI algorithm were presented with 500 slides of rodent heart. They quantified the amount of cardiomyopathy in each slide. A total of 200 of these slides were novel to this study, whereas 100 slides were intentionally selected for repetition from the previous study. After a washout period of more than six months, the repeated slides were examined to assess intra-rater agreement among pathologists. We found the intra-rater agreement to be substantial, with weighted Cohen's kappa values ranging from k = 0.64 to 0.80. Intra-rater variability is not a concern for the deterministic AI. The inter-rater agreement across pathologists was moderate (Cohen's kappa k = 0.56). These results demonstrate the utility of AI algorithms as a tool for pathologists to increase sensitivity and specificity for the histopathologic assessment of the heart in toxicology studies.