A Multimodal Generative AI Copilot for Human Pathology.
Ming Y LuBowen ChenDrew F K WilliamsonRichard J ChenMelissa ZhaoAaron K ChowKenji IkemuraAhrong KimDimitra PouliAnkush PatelAmr SolimanChengkuan ChenTong DingJudy J WangGeorg K GerberIvy LiangLong Phi LeAnil V ParwaniLuca L WeishauptFaisal MahmoodPublished in: Nature (2024)
The field of computational pathology[1,2] has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders[3,4]. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants and copilots[5] tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We build PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and finetuning the whole system on over 456,000 diverse visual language instructions consisting of 999,202 question-answer turns. We compare PathChat against several multimodal vision language AI assistants and GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4[7]. PathChat achieved state-of-the-art performance on multiple-choice diagnostic questions from cases of diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision-language AI Copilot that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.