Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.
Javier Alejandro Flores-CohailaAbigaíl García-VicenteSonia F Vizcarra-JiménezJanith P De la Cruz-GalánJesús D Gutiérrez-ArratiaBlanca Geraldine Quiroga TorresÁlvaro Taype-RondanPublished in: JMIR medical education (2023)
Our study found that ChatGPT (GPT-3.5 and GPT-4) can achieve expert-level performance on the ENAM, outperforming most of our examinees. We found fair agreement between both GPT-3.5 and GPT-4. Incorrect answers were associated with the difficulty of questions, which may resemble human performance. Furthermore, by reinputting questions that initially received incorrect answers with different prompts containing additional roles and context, ChatGPT achieved improved accuracy.