BAF-Net: bidirectional attention-aware fluid pyramid feature integrated multimodal fusion network for diagnosis and prognosis.
Huiqin WuLihong PengDongyang DuHui XuGuoyu LinZidong ZhouLijun LuWenbing LvPublished in: Physics in medicine and biology (2024)
Objective . To go beyond the deficiencies of the three conventional multimodal fusion strategies (i.e. input-, feature- and output-level fusion), we propose a bidirectional attention-aware fluid pyramid feature integrated fusion network (BAF-Net) with cross-modal interactions for multimodal medical image diagnosis and prognosis. Approach . BAF-Net is composed of two identical branches to preserve the unimodal features and one bidirectional attention-aware distillation stream to progressively assimilate cross-modal complements and to learn supplementary features in both bottom-up and top-down processes. Fluid pyramid connections were adopted to integrate the hierarchical features at different levels of the network, and channel-wise attention modules were exploited to mitigate cross-modal cross-level incompatibility. Furthermore, depth-wise separable convolution was introduced to fuse the cross-modal cross-level features to alleviate the increase in parameters to a great extent. The generalization abilities of BAF-Net were evaluated in terms of two clinical tasks: (1) an in-house PET-CT dataset with 174 patients for differentiation between lung cancer and pulmonary tuberculosis. (2) A public multicenter PET-CT head and neck cancer dataset with 800 patients from nine centers for overall survival prediction. Main results . On the LC-PTB dataset, improved performance was found in BAF-Net (AUC = 0.7342) compared with input-level fusion model (AUC = 0.6825; p < 0.05), feature-level fusion model (AUC = 0.6968; p = 0.0547), output-level fusion model (AUC = 0.7011; p < 0.05). On the H&N cancer dataset, BAF-Net (C-index = 0.7241) outperformed the input-, feature-, and output-level fusion model, with 2.95%, 3.77%, and 1.52% increments of C-index ( p = 0.3336, 0.0479 and 0.2911, respectively). The ablation experiments demonstrated the effectiveness of all the designed modules regarding all the evaluated metrics in both datasets. Significance . Extensive experiments on two datasets demonstrated better performance and robustness of BAF-Net than three conventional fusion strategies and PET or CT unimodal network in terms of diagnosis and prognosis.
Keyphrases
- pet ct
- working memory
- machine learning
- deep learning
- end stage renal disease
- pulmonary tuberculosis
- positron emission tomography
- healthcare
- ejection fraction
- computed tomography
- systematic review
- newly diagnosed
- chronic kidney disease
- prognostic factors
- randomized controlled trial
- squamous cell carcinoma
- clinical trial
- emergency department
- magnetic resonance imaging
- patient reported outcomes
- papillary thyroid
- double blind
- dual energy