Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: a systematic review.
Qing DanZiting XuHannah BurrowsJennifer BissramJeffrey S A StringerYingjia LiPublished in: NPJ precision oncology (2024)
Deep learning (DL) has been widely investigated in breast ultrasound (US) for distinguishing between benign and malignant breast masses. This systematic review of test diagnosis aims to examine the accuracy of DL, compared to human readers, for the diagnosis of breast cancer in the US under clinical settings. Our literature search included records from databases including PubMed, Embase, Scopus, and Cochrane Library. Test accuracy outcomes were synthesized to compare the diagnostic performance of DL and human readers as well as to evaluate the assistive role of DL to human readers. A total of 16 studies involving 9238 female participants were included. There were no prospective studies comparing the test accuracy of DL versus human readers in clinical workflows. Diagnostic test results varied across the included studies. In 14 studies employing standalone DL systems, DL showed significantly lower sensitivities in 5 studies with comparable specificities and outperformed human readers at higher specificities in another 4 studies; in the remaining studies, DL models and human readers showed equivalent test outcomes. In 12 studies that assessed assistive DL systems, no studies proved the assistive role of DL in the overall diagnostic performance of human readers. Current evidence is insufficient to conclude that DL outperforms human readers or enhances the accuracy of diagnostic breast US in a clinical setting. Standardization of study methodologies is required to improve the reproducibility and generalizability of DL research, which will aid in clinical translation and application.