Evaluation of GPT-4's Chest X-Ray Impression Generation: A Reader Study on Performance and Perception.
Sebastian ZiegelmayerAlexander W MarkaNicolas LenhartNadja NehlsPhilipp-Alexander NeumannFelix N HarderAndreas Philipp SauterMarcus R MakowskiMarkus GrafJoshua GawlitzaPublished in: Journal of medical Internet research (2023)
Exploring the generative capabilities of the multimodal GPT-4, our study uncovered significant differences between radiological assessments and automatic evaluation metrics for chest x-ray impression generation and revealed radiological bias.