Optimizing Performance of Transformer-based Models for Fetal Brain MR Image Segmentation.
Nicolò PeccoPasquale Anthony Della RosaMatteo CaniniGianluca NoceraPaola ScifoPaolo Ivo CavorettoMassimo CandianiAndrea FaliniAntonella CastellanoCristina BaldoliPublished in: Radiology. Artificial intelligence (2024)
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence . This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To test transformer-based models' performance when manipulating pretraining weights, dataset size, input size and comparing the best-model with reference standard and state-of-the-art models for a resting-state functional (rs-fMRI) fetal brain extraction task. Materials and Methods An internal retrospective dataset (fetuses = 172; images = 519; collected from 2018-2022) was used to investigate influence of dataset size, pretraining approaches and image input size on Swin-UNETR and UNETR models. The internal and an external (fetuses = 131; images = 561) datasets were used to cross-validate and to assess generalization capability of the best model against state-of-the-art models on different scanner types and number of gestational weeks (GW). The Dice similarity coefficient (DSC) and the Balanced average Hausdorff distance (BAHD) were used as segmentation performance metrics. GEE multifactorial models were used to assess significant model and interaction effects of interest. Results Swin-UNETR was not affected by pretraining approach and dataset size and performed best with the mean dataset image size, with a mean DSC of 0.92 and BAHD of 0.097. The Swin-UNETR was not affected by scanner type. Generalization results on the internal dataset showed that Swin-UNETR had lower performances compared with reference standard models and comparable performances on the external dataset. Cross-validation on internal and external test sets demonstrated better and comparable performance of Swin-UNETR versus convolutional neural network architectures during the late-fetal period (GWs > 25) but lower performance during the midfetal period (GWs ≤ 25). Conclusion Swin-UNTER showed flexibility in dealing with smaller datasets, regardless of pretraining approaches. For fetal brain extraction of rs-fMRI, Swin-UNTER showed comparable performance with reference standard models during the late-fetal period and lower performance during the early GW period. ©RSNA, 2024.