With the growing presence of multimodal content on the web, a specific category of fake news is rampant on popular social media outlets. In this category of fake online information, real multimedia contents (images, videos) are used in different but related contexts with manipulated texts to mislead the readers. The presence of seemingly non-manipulated multimedia content reinforces the belief in the associated fabricated textual content. Detecting this category of misleading multimedia fake news is almost impossible without relevance to any prior knowledge. In addition to this, the presence of highly novel and emotion-invoking contents can fuel the rapid dissemination of such fake news. To counter this problem, in this paper, we first introduce a novel multimodal fake news dataset that includes background knowledge (from authenticate sources) of the misleading articles. Second, we design a multimodal framework using Supervised Contrastive Learning (SCL) based novelty detection and Emotion Prediction tasks for fake news detection. We perform extensive experiments to reveal that our proposed model outperforms the state-of-the-art (SOTA) models.