A Semiautonomous Deep Learning System to Reduce False-Positive Findings in Screening Mammography.
Stefano PedemonteTrevor TsueBrent MombourquetteYen Nhi Truong VuThomas P MatthewsRodrigo Morales HoilMeet ShahNikita GhareNaomi Zingman-DanielsSusan HolleyCatherine M AppletonJason H SuRichard L WahlPublished in: Radiology. Artificial intelligence (2024)
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence . This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To evaluate the ability of a semiautonomous artificial intelligence (AI) model to identify screening mammograms not suspicious for breast cancer and reduce the number of false-positive examinations. Materials and Methods The deep learning algorithm was trained using 123,248 2D digital mammograms (6,161 cancers) and a retrospective study was performed on three nonoverlapping datasets of 14,831 screening mammography examinations (1,026 cancers) from 2 US and 1 UK institutions (2008-2017). The standalone performance of humans and AI was compared. Human+AI performance was simulated to examine reductions in the cancer detection rate, number of examinations, false positive callbacks, and benign biopsies. Metrics were adjusted to mimic the natural distribution of a screening population, and bootstrapped confidence intervals (CI) and P values were calculated. Results Retrospective evaluation on all datasets showed minimal changes to the cancer detection rate with use of the AI device (US Dataset 1 P = .02, US Dataset 2 P < .001, UK P < .001, noninferiority margin of 0.25 cancers per 1000 examinations). On US Dataset 1 (11,592 mammograms, 101 cancers, 3810 female patients, mean age 57.3 ± [SD] 10.0 years), the device reduced screening examinations requiring radiologist interpretation by 41.6% [95% CI: 40.6%, 42.4%] ( P < .001), diagnostic examinations callbacks by 31.1% [28.7%, 33.4%] ( P < .001), and benign needle biopsies by 7.4% [4.1%, 12.4%] ( P < .001). US Dataset 2 (1362 mammograms, 330 cancers, 1293 female patients, mean age 55.4 ± 10.5 years) had reductions of 19.5% [16.9%, 22.1%] ( P < .001), 11.9% [8.6%, 15.7%] ( P < .001), and 6.5% [0.0%, 19.0%] ( P = .08), respectively. The UK dataset (1877 mammograms, 595 cancers, 1491 female patients, mean age 63.5 ± 7.1 SD) had reductions of 36.8% [34.4%, 39.7%] ( P < .001), 17.1% [5.9%, 30.1%] ( P < .001), and 5.9% [2.9%, 11.5%] ( P < .001), respectively. Conclusion This work demonstrates the potential of a semiautonomous breast cancer screening system to reduce false positives, unnecessary procedures, patient anxiety, and medical expenses. Published under a CC BY 4.0 license.
Keyphrases
- artificial intelligence
- deep learning
- machine learning
- end stage renal disease
- big data
- newly diagnosed
- chronic kidney disease
- ejection fraction
- healthcare
- peritoneal dialysis
- prognostic factors
- patient reported outcomes
- emergency department
- climate change
- randomized controlled trial
- convolutional neural network
- magnetic resonance imaging
- papillary thyroid
- magnetic resonance
- computed tomography
- ultrasound guided
- image quality
- patient safety
- quantum dots
- neural network