AI-integrated Screening to Replace Double Reading of Mammograms: A Population-wide Accuracy and Feasibility Study.

Mohammad Talal Elhakim Sarah W Stougaard Ole Graumann Mads Nielsen Oke GerkeLisbet B LarsenBenjamin Schnack Brandt Rasmussen

Published in: Radiology. Artificial intelligence (2024)

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence . This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Mammography screening supported by deep learning-based artificial intelligence (AI) solutions can potentially reduce workload without compromising breast cancer detection accuracy, but the site of deployment in the workflow might be crucial. This retrospective study compared three simulated AI-integrated screening scenarios with standard double reading with arbitration in a sample of 249,402 mammograms from a representative screening population. A commercial AI system replaced the first reader (Scenario 1: Integrated AI first ), the second reader (Scenario 2: Integrated AI second ), or both readers for triaging of low- and high-risk cases (Integrated AI triage ). AI threshold values were partly chosen based on previous validation and fixing screen-read volume reduction at approximately 50% across scenarios. Detection accuracy measures were calculated. Compared with standard double reading, Integrated AI first showed no evidence of a difference in accuracy metrics except for a higher arbitration rate (+0.99%; P < .001). Integrated AI second had lower sensitivity (-1.58%; P < 0.001), negative predictive value (NPV) (- 0.01%; P < .001) and recall rate (< 0.06%; P = 0.04), but a higher positive predictive value (PPV) (+0.03%; P < .001) and arbitration rate (+1.22%; P < .001). Integrated AI triage achieved higher sensitivity (+1.33%; P < .001), PPV (+0.36%; P = .03), and NPV (+0.01%; P < .001) but lower arbitration rate (-0.88%; P < .001). Replacing one or both readers with AI seems feasible, however, the site of application in the workflow can have clinically relevant effects on accuracy and workload. ©RSNA, 2024.

Keyphrases