Fusing multimodal brain image features to empower statistical analysis has attracted considerable research interest. Generally, a feature mapping is learned in the fusion process so the cross-modality relationship in the multimodal data can be more effectively extracted in a common feature space. Most of the prior work achieve this goal by data-driven approaches without considering the geometry properties of the feature spaces where the data are embedded. It results in a huge sacrifice of untapped information. Here, we propose to fuse the multimodal brain images through a novel geometric approach. The key idea is to encode various brain image features with the local metric change on brain shapes, such that the feature mapping can be efficiently solved by some geometric mapping functions, i.e., quasiconformal and harmonic mappings. We approach our multimodal fusion framework (MFRM) in two steps: surface feature mapping and volumetric feature mapping. For each of them, we design an informative Riemannian metric based on distinct brain anatomical features and achieve image fusion via diffeomorphic maps. We evaluate our proposed method on two brain image cohorts. The experimental results reveal the effectiveness of our proposed framework which yields better statistical performances than state-of-the-art data-driven methods.