Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores.
Hao ZhangNeil JethaniSimon JonesNicholas GenesVincent J MajorIan S JaffeAnthony B CardilloNoah HeilenNadia Fazal AliLuke J BonanniAndrew J ClayburnZain KheraErica C SadlerJaideep PrasadJamie SchlacterKevin LiuBenjamin A SilvaSophie MontgomeryEric J KimJacob LesterTheodore M HillAlba AvoricaniEthan ChervonskiJames DavydovWilliam SmallEesha ChakravarttyHimanshu GroverJohn A DodsonAbraham A BrodyYindalon AphinyanaphongsArjun V MasurkarNarges RazavianPublished in: medRxiv : the preprint server for health sciences (2024)
In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.