Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study.
Fadi AjmanMohammad-Hani TemsahIbraheem AltamimiAyman Al-EyadhyAmr A JamalKhalid A AlhasanTamer A MesallamMohamed FarahatKhalid H MalkiPublished in: JMIR medical informatics (2024)
The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots' RHS could contribute to ongoing efforts to enhance AI's general reliability in medical research.