Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study.

Fadi Ajman Mohammad-Hani Temsah Ibraheem Altamimi Ayman Al-Eyadhy Amr A Jamal Khalid A Alhasan Tamer A Mesallam Mohamed Farahat Khalid H Malki

Published in: JMIR medical informatics (2024)

The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots' RHS could contribute to ongoing efforts to enhance AI's general reliability in medical research.

Keyphrases

artificial intelligence
machine learning
big data
deep learning
healthcare
electronic health record
quality improvement