Evaluating Cellularity Estimation Methods: Comparing AI Counting with Pathologists' Visual Estimates.
Tomoharu KiyunaEric CosattoKanako C HatanakaTomoyuki YokoseKoji TsutaNoriko MotoiKeishi MakitaAi ShimizuToshiya ShinoharaAkira SuzukiEmi TakakuwaYasunari TakakuwaTakahiro TsujiMitsuhiro TsujiwakiMitsuru YanaiSayaka YuzawaMaki OguraYutaka HatanakaPublished in: Diagnostics (Basel, Switzerland) (2024)
The development of next-generation sequencing (NGS) has enabled the discovery of cancer-specific driver gene alternations, making precision medicine possible. However, accurate genetic testing requires a sufficient amount of tumor cells in the specimen. The evaluation of tumor content ratio (TCR) from hematoxylin and eosin (H&E)-stained images has been found to vary between pathologists, making it an important challenge to obtain an accurate TCR. In this study, three pathologists exhaustively labeled all cells in 41 regions from 41 lung cancer cases as either tumor, non-tumor or indistinguishable, thus establishing a "gold standard" TCR. We then compared the accuracy of the TCR estimated by 13 pathologists based on visual assessment and the TCR calculated by an AI model that we have developed. It is a compact and fast model that follows a fully convolutional neural network architecture and produces cell detection maps which can be efficiently post-processed to obtain tumor and non-tumor cell counts from which TCR is calculated. Its raw cell detection accuracy is 92% while its classification accuracy is 84%. The results show that the error between the gold standard TCR and the AI calculation was significantly smaller than that between the gold standard TCR and the pathologist's visual assessment (p<0.05). Additionally, the robustness of AI models across institutions is a key issue and we demonstrate that the variation in AI was smaller than that in the average of pathologists when evaluated by institution. These findings suggest that the accuracy of tumor cellularity assessments in clinical workflows is significantly improved by the introduction of robust AI models, leading to more efficient genetic testing and ultimately to better patient outcomes.