Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data.
Cathy Ong LyBalagopal UnnikrishnanTony TadicTirth PatelJoseph DuhamelSonja KandelYasbanoo MoayediMichael BrudnoAndrew J HopeHeather RossChristopher McIntoshPublished in: NPJ digital medicine (2024)
Healthcare datasets are becoming larger and more complex, necessitating the development of accurate and generalizable AI models for medical applications. Unstructured datasets, including medical imaging, electrocardiograms, and natural language data, are gaining attention with advancements in deep convolutional neural networks and large language models. However, estimating the generalizability of these models to new healthcare settings without extensive validation on external data remains challenging. In experiments across 13 datasets including X-rays, CTs, ECGs, clinical discharge summaries, and lung auscultation data, our results demonstrate that model performance is frequently overestimated by up to 20% on average due to shortcut learning of hidden data acquisition biases (DAB). Shortcut learning refers to a phenomenon in which an AI model learns to solve a task based on spurious correlations present in the data as opposed to features directly related to the task itself. We propose an open source, bias-corrected external accuracy estimate, P Est , that better estimates external accuracy to within 4% on average by measuring and calibrating for DAB-induced shortcut learning.