The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication under Conditions of Clinical Uncertainty.
James H FloryJessica S AnckerScott Y H KimGilad KupermanAleksandr PetrovAndrew VickersPublished in: Diabetes care (2024)
In clinical scenarios with no single right answer, GPT-4's responses were reasonable, but differed from endocrinologists' responses in clinically important ways. Value judgments are needed to determine when these differences should be addressed by adjusting the model. We recommend against reliance on LLM output until it is shown to align not just with clinical guidelines but also with patient and clinician preferences, or it demonstrates improvement in clinical outcomes over standard of care.