Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.

Leonard Knoedler Michael G Alfertshofer Leonard Knoedler Cosima C Hoch Paul F Funk Sebastian Cotofana Bhagvat J Maheta Konstantin Frank Vanessa Brébant Lukas Prantl Philipp Lamby

Published in: JMIR medical education (2024)

In this study, ChatGPT 4 demonstrated remarkable proficiency in taking the USMLE Step 3, with an accuracy rate of 84.7% (194/229), outshining ChatGPT 3.5 with an accuracy rate of 56.9% (1047/1840). Although ChatGPT 4 performed exceptionally, it encountered difficulties in questions requiring the application of theoretical concepts, particularly in cardiology and neurology. These insights are pivotal for the development of examination strategies that are resilient to AI and underline the promising role of AI in the realm of medical education and diagnostics.

Keyphrases

medical education
artificial intelligence
cardiac surgery
machine learning