Multitask Deep Learning for Segmentation and Classification of Primary Bone Tumors on Radiographs.
Claudio E von SchackyNikolas Jakob WilhelmValerie S SchäferYannik LeonhardtFelix G GassertSarah C ForemanFlorian Tilman GassertMatthias JungPia M JungmannMaximilian Frederik RusseCarolin MoglerCarolin KnebelRüdiger von Eisenhart-RotheMarcus R MakowskiKlaus WörtlerRainer BurgkartAlexandra S GersingPublished in: Radiology (2021)
Background An artificial intelligence model that assesses primary bone tumors on radiographs may assist in the diagnostic workflow. Purpose To develop a multitask deep learning (DL) model for simultaneous bounding box placement, segmentation, and classification of primary bone tumors on radiographs. Materials and Methods This retrospective study analyzed bone tumors on radiographs acquired prior to treatment and obtained from patient data from January 2000 to June 2020. Benign or malignant bone tumors were diagnosed in all patients by using the histopathologic findings as the reference standard. By using split-sample validation, 70% of the patients were assigned to the training set, 15% were assigned to the validation set, and 15% were assigned to the test set. The final performance was evaluated on an external test set by using geographic validation, with accuracy, sensitivity, specificity, and 95% CIs being used for classification, the intersection over union (IoU) being used for bounding box placements, and the Dice score being used for segmentations. Results Radiographs from 934 patients (mean age, 33 years ± 19 [standard deviation]; 419 women) were evaluated in the internal data set, which included 667 benign bone tumors and 267 malignant bone tumors. Six hundred fifty-four patients were in the training set, 140 were in the validation set, and 140 were in the test set. One hundred eleven patients were in the external test set. The multitask DL model achieved 80.2% (89 of 111; 95% CI: 72.8, 87.6) accuracy, 62.9% (22 of 35; 95% CI: 47, 79) sensitivity, and 88.2% (67 of 76; CI: 81, 96) specificity in the classification of bone tumors as malignant or benign. The model achieved an IoU of 0.52 ± 0.34 for bounding box placements and a mean Dice score of 0.60 ± 0.37 for segmentations. The model accuracy was higher than that of two radiologic residents (71.2% and 64.9%; P = .002 and P < .001, respectively) and was comparable with that of two musculoskeletal fellowship-trained radiologists (83.8% and 82.9%; P = .13 and P = .25, respectively) in classifying a tumor as malignant or benign. Conclusion The developed multitask deep learning model allowed for accurate and simultaneous bounding box placement, segmentation, and classification of primary bone tumors on radiographs. © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Carrino in this issue.
Keyphrases