Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan.
Atilla P KiralyCorbin A CunninghamRyan NajafiZaid NabulsiJie YangCharles LauJoseph R LedsamWenxing YeDiego ArdilaScott Mayer McKinneyRory PilgrimYuan LiuHiroaki SaitoYasuteru ShimamuraMozziyar EtemadiDavid MelnickSunny JansenGreg S CorradoLily PengDaniel TseShravya ShettyShruthi PrabhakaraDavid P NaidichNeeral BeladiaKrish EswaranPublished in: Radiology. Artificial intelligence (2024)
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence . This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To evaluate the impact of an artificial intelligence (AI) assistant for lung cancer screening (LCS) on multinational clinical workflows. Materials and Methods An AI assistant for LCS was evaluated on two retrospective randomized multireader multicase studies, where 627 (141 cancer positive) low-dose chest CT cases were each read twice (with and without AI assistance) by experienced thoracic radiologists ( 6 US-based or 6 Japan-based ), resulting in a total of 7,524 interpretations. Positive cases were defined as those within two years before a pathology-confirmed lung cancer diagnosis. Negative cases were defined as those without any subsequent cancer diagnosis for at least two years and were enriched for a spectrum of diverse nodules. The studies measured the readers' level of suspicion (LoS, on a 0-100 scale), country-specific screening system scoring categories, and management recommendations. Evaluation metrics included the area under the receiver operating characteristic curve (AUC) for LoS and sensitivity and specificity of recall recommendations. Results With AI assistance, the radiologists' AUC increased by 0.023 (0.70 to 0.72, P = .02) for the US study and by 0.023 (0.93 to 0.96, P = .18) for the Japan study. Scoring system specificity for actionable findings increased 5.5% (57%-63%, P < .001) for the US study and 6.7% (23%-30%, P < .001) for the Japan study. There was no evidence of a difference in corresponding sensitivity between unassisted and AI-assisted reads for the US (67.3%-67.5%, P = .88) and Japan (98%-100%, P > .99) studies. Corresponding standalone AI AUC system performance was 0.75 95% CI [0.70-0.81] and 0.88 95%CI [0.78-0.97] for the US and Japan-based datasets, respectively. Conclusion The concurrent AI interface improved LCS specificity in both US and Japan-based reader studies, meriting further study in additional international screening environments. ©RSNA, 2024.
Keyphrases
- artificial intelligence
- machine learning
- big data
- low dose
- deep learning
- randomized controlled trial
- spinal cord
- emergency department
- systematic review
- squamous cell carcinoma
- cross sectional
- computed tomography
- magnetic resonance
- placebo controlled
- rna seq
- case control
- study protocol
- pet ct
- psychometric properties
- drug induced