Can Large Language Models (LLMs) Predict the Appropriate Treatment of Acute Hip Fractures in Older Adults? Comparing Appropriate Use Criteria With Recommendations From ChatGPT.
Katrina S NietschNancy ShresthaLaura C Mazudie NdjonkoWasil AhmedMateo Restrepo MejiaBashar ZaidatRenee RenAkiro H DueySamuel Q LiJun S KimKrystin A HiddenSamuel K ChoPublished in: Journal of the American Academy of Orthopaedic Surgeons. Global research & reviews (2024)
ChatGPT-4.0 scores were not concordant with AAOS scores, overestimating the appropriateness of total hip arthroplasty, hemiarthroplasty, and long cephalomedullary nails, and underestimating the other three. ChatGPT-4.0 was inadequate in selecting an appropriate treatment deemed acceptable, most reasonable, and most likely to improve patient outcomes.