Doctor's Orders-Why Radiologists Should Consider Adjusting Commercial Machine Learning Applications in Chest Radiography to Fit Their Specific Needs.
Frank Philipp SchweikhardAnika KosankeSandra LangeMarie-Luise KromreyFiona MankertzJulie GamainMichael KirschBritta RosenbergNorbert HostenPublished in: Healthcare (Basel, Switzerland) (2024)
This retrospective study evaluated a commercial deep learning (DL) software for chest radiographs and explored its performance in different scenarios. A total of 477 patients (284 male, 193 female, mean age 61.4 (44.7-78.1) years) were included. For the reference standard, two radiologists performed independent readings on seven diseases, thus reporting 226 findings in 167 patients. An autonomous DL reading was performed separately and evaluated against the gold standard regarding accuracy, sensitivity and specificity using ROC analysis. The overall average AUC was 0.84 (95%-CI 0.76-0.92) with an optimized DL sensitivity of 85% and specificity of 75.4%. The best results were seen in pleural effusion with an AUC of 0.92 (0.885-0.955) and sensitivity and specificity of each 86.4%. The data also showed a significant influence of sex, age, and comorbidity on the level of agreement between gold standard and DL reading. About 40% of cases could be ruled out correctly when screening for only one specific disease with a sensitivity above 95% in the exploratory analysis. For the combined reading of all abnormalities at once, only marginal workload reduction could be achieved due to insufficient specificity. DL applications like this one bear the prospect of autonomous comprehensive reporting on chest radiographs but for now require human supervision. Radiologists need to consider possible bias in certain patient groups, e.g., elderly and women. By adjusting their threshold values, commercial DL applications could already be deployed for a variety of tasks, e.g., ruling out certain conditions in screening scenarios and offering high potential for workload reduction.
Keyphrases
- end stage renal disease
- machine learning
- artificial intelligence
- deep learning
- ejection fraction
- working memory
- newly diagnosed
- peritoneal dialysis
- climate change
- endothelial cells
- risk assessment
- emergency department
- big data
- electronic health record
- pregnant women
- magnetic resonance imaging
- computed tomography
- structural basis
- type diabetes
- insulin resistance
- adverse drug
- case report
- patient reported
- data analysis
- induced pluripotent stem cells
- community dwelling
- pluripotent stem cells
- cone beam computed tomography
- image quality
- breast cancer risk
- silver nanoparticles