Login / Signup

Mammography Breast Cancer Screening Triage Using Deep Learning: A UK Retrospective Study.

Sarah E HickmanNicholas R PayneRichard T BlackYuan HuangAndrew N PriestSue M HudsonBahman KasmaiArne JuetteMuzna NanaaMuhammad Iqbal AniqAnna SienkoFiona J Gilbert
Published in: Radiology (2023)
Background Breast screening enables early detection of cancers; however, most women have normal mammograms, resulting in repetitive and resource-intensive reading tasks. Purpose To investigate if deep learning (DL) algorithms can be used to triage mammograms by identifying normal results to reduce workload or flag cancers that may be overlooked. Materials and Methods In this retrospective study, three commercial DL algorithms were investigated using consecutive mammograms from two UK Breast Screening Program sites from January 2015 to December 2017 and January 2017 to December 2018 on devices from two mammography vendors. Normal mammograms with a 3-year follow-up and histopathologically proven cancer detected at screening, the subsequent round, or in the 3-year interval were included. Two algorithm thresholds were set: in scenario A, 99.0% sensitivity for rule-out triage to a lone reader, and in scenario B, approximately 1.0% additional recall providing a rule-in triage for further assessment. Both thresholds were then applied to the screening workflow in scenario C. The sensitivity and specificity were used to assess the overall predictive performance of each DL algorithm. Results The data set comprised 78 849 patients (median age, 59 years [IQR, 53-63 years]) and 887 screening-detected, 439 interval, and 688 subsequent screening round-detected cancers. In scenario A (rule-out triage), models DL-1, DL-2, and DL-3 triaged 35.0% (27 565 of 78 849), 53.2% (41 937 of 78 849), and 55.6% (43 869 of 78 849) of mammograms, respectively, with 0.0% (0 of 887) to 0.1% (one of 887) of screening-detected cancers undetected. In scenario B, DL algorithms triaged in 4.6% (20 of 439) to 8.2% (36 of 439) of interval and 5.2% (36 of 688) to 6.1% (42 of 688) of subsequent-round cancers when applied after the routine double-reading workflow. Combining both approaches in scenario C resulted in an overall noninferior specificity (difference, -0.9%; P < .001) and superior sensitivity (difference, 2.7%; P < .001) for the adaptive workflow compared with routine double reading for all three algorithms. Conclusion Rule-out and rule-in DL-adapted triage workflows can improve the efficiency and efficacy of mammography breast cancer screening. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Nishikawa and Lu in this issue.
Keyphrases