Overcoming data scarcity in biomedical imaging with a foundational multi-task model.
Raphael SchäferTill NickeHenning HöfenerAnnkristin LangeDorit MerhofFriedrich FeuerhakeVolkmar SchulzJohannes LotzFabian KießlingPublished in: Nature computational science (2024)
Foundational models, pretrained on a large scale, have demonstrated substantial success across non-medical domains. However, training these models typically requires large, comprehensive datasets, which contrasts with the smaller and more specialized datasets common in biomedical imaging. Here we propose a multi-task learning strategy that decouples the number of training tasks from memory requirements. We trained a universal biomedical pretrained model (UMedPT) on a multi-task database including tomographic, microscopic and X-ray images, with various labeling strategies such as classification, segmentation and object detection. The UMedPT foundational model outperformed ImageNet pretraining and previous state-of-the-art models. For classification tasks related to the pretraining database, it maintained its performance with only 1% of the original training data and without fine-tuning. For out-of-domain tasks it required only 50% of the original training data. In an external independent validation, imaging features extracted using UMedPT proved to set a new standard for cross-center transferability.
Keyphrases
- working memory
- deep learning
- high resolution
- electronic health record
- virtual reality
- machine learning
- big data
- convolutional neural network
- healthcare
- artificial intelligence
- magnetic resonance imaging
- computed tomography
- adverse drug
- magnetic resonance
- loop mediated isothermal amplification
- optical coherence tomography
- photodynamic therapy
- real time pcr
- electron microscopy