BAF-Net: bidirectional attention-aware fluid pyramid feature integrated multimodal fusion network for diagnosis and prognosis.
Huiqin WuLihong PengDongyang DuHui XuGuoyu LinZidong ZhouLijun LuWenbing LvPublished in: Physics in medicine and biology (2024)
To go beyond the deficiencies of the three conventional multimodal fusion strategies (i.e., input-, feature- and output-level fusion), we propose a bidirectional attention-aware fluid pyramid feature integrated fusion network (BAF-Net) with cross-modal interactions for multimodal medical image diagnosis and prognosis.
Approach: BAF-Net is composed of two identical branches to preserve the unimodal features and one bidirectional attention-aware distillation stream to progressively assimilate cross-modal complements and to learn supplementary features in both bottom-up and top-down processes. Fluid pyramid connections were adopted to integrate the hierarchical features at different levels of the network, and channel-wise attention modules were exploited to mitigate cross-modal cross-level incompatibility. Furthermore, depth-wise separable convolution was introduced to fuse the cross-modal cross-level features to alleviate the increase in parameters to a great extent. The generalization abilities of BAF-Net were evaluated in terms of two clinical tasks: (1) An in-house PET-CT dataset with 174 patients for differentiation between lung cancer and pulmonary tuberculosis. (2) A public multicenter PET-CT head and neck cancer dataset with 800 patients from nine centers for overall survival prediction.
Main results: On the LC-PTB dataset, improved performance was found in BAF-Net (AUC = 0.7342) compared with input-level fusion model (AUC = 0.6825; p < 0.05), feature-level fusion model (AUC = 0.6968; p = 0.0547), output-level fusion model (AUC = 0.7011; p < 0.05). On the H&N cancer dataset, BAF-Net (C-index = 0.7241) outperformed the input-, feature-, and output-level fusion model, with 2.95%, 3.77%, and 1.52% increments of C-index (p = 0.3336, 0.0479 and 0.2911, respectively). The ablation experiments demonstrated the effectiveness of all the designed modules regarding all the evaluated metrics in both datasets.
Significance: Extensive experiments on two datasets demonstrated better performance and robustness of BAF-Net than three conventional fusion strategies and PET or CT unimodal network in terms of diagnosis and prognosis.
Keyphrases
- pet ct
- working memory
- deep learning
- machine learning
- end stage renal disease
- pulmonary tuberculosis
- ejection fraction
- healthcare
- positron emission tomography
- computed tomography
- chronic kidney disease
- peritoneal dialysis
- systematic review
- clinical trial
- optical coherence tomography
- mycobacterium tuberculosis
- pain management
- neural network
- high resolution
- mass spectrometry
- cross sectional
- patient reported outcomes
- single cell
- young adults
- rna seq
- pet imaging
- liquid chromatography
- squamous cell