Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis.

Mikaël Chelli Jules Descamps Vincent Lavoué Christophe Trojani Michel Azar Marcel Deckert Jean-Luc Raynier Gilles Clowez Pascal Boileau Caroline Ruetsch

Published in: Journal of medical Internet research (2024)

Given their current performance, it is not recommended for LLMs to be deployed as the primary or exclusive tool for conducting systematic reviews. Any references generated by such models warrant thorough validation by researchers. The high occurrence of hallucinations in LLMs highlights the necessity for refining their training and functionality before confidently using them for rigorous academic purposes.

Keyphrases

systematic review
meta analyses
risk assessment
virtual reality