Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores.

Hao ZhangNeil JethaniSimon JonesNicholas Genes Vincent J Major Ian S Jaffe Anthony B Cardillo Noah Heilen Nadia Fazal Ali Luke J Bonanni Andrew J Clayburn Zain Khera Erica C Sadler Jaideep Prasad Jamie Schlacter Kevin Liu Benjamin A Silva Sophie Montgomery Eric J Kim Jacob Lester Theodore M Hill Alba Avoricani Ethan ChervonskiJames DavydovWilliam SmallEesha ChakravarttyHimanshu GroverJohn A DodsonAbraham A Brody Yindalon Aphinyanaphongs Arjun V Masurkar Narges Razavian

Published in: medRxiv : the preprint server for health sciences (2024)

In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.

Keyphrases