On the limitations of large language models in clinical diagnosis.
Justin T ReeseDaniel DanisJ Harry CaufieldElena CasiraghiGiorgio ValentiniChristopher John MungallPeter Nick RobinsonPublished in: medRxiv : the preprint server for health sciences (2023)
We consider the feature-based queries to be a more appropriate test of the performance of GPT-4 in diagnostic tasks, since it is unlikely that the narrative approach can be used in actual clinical practice. Future research and algorithmic development is needed to determine the optimal approach to leveraging LLMs for clinical diagnosis.