Login / Signup

Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis.

Mikaël ChelliJules DescampsVincent LavouéChristophe TrojaniMichel AzarMarcel DeckertJean-Luc RaynierGilles ClowezPascal BoileauCaroline Ruetsch
Published in: Journal of medical Internet research (2024)
Given their current performance, it is not recommended for LLMs to be deployed as the primary or exclusive tool for conducting systematic reviews. Any references generated by such models warrant thorough validation by researchers. The high occurrence of hallucinations in LLMs highlights the necessity for refining their training and functionality before confidently using them for rigorous academic purposes.
Keyphrases
  • systematic review
  • meta analyses
  • risk assessment
  • randomized controlled trial
  • medical students