Login / Signup

The Use of Artificial Intelligence in the Differentiation of Malignant and Benign Lung Nodules on Computed Tomograms Proven by Surgical Pathology.

Yung-Liang WanPatricia Wanping WuPei-Ching HuangPei-Kwei TsayKuang-Tse PanNguyen Ngoc TrangWen-Yu ChuangChing-Yang WuShihChung Benedict Lo
Published in: Cancers (2020)
The purpose of this work was to evaluate the performance of an existing commercially available artificial intelligence (AI) software system in differentiating malignant and benign lung nodules. The AI tool consisted of a vessel-suppression function and a deep-learning-based computer-aided-detection (VS-CAD) analyzer. Fifty patients (32 females, mean age 52 years) with 75 lung nodules (47 malignant and 28 benign) underwent low-dose computed tomography (LDCT) followed by surgical excision and the pathological analysis of their 75 nodules within a 3 month time frame. All 50 cases were then processed by the AI software to generate corresponding VS images and CAD outcomes. All 75 pathologically proven lung nodules were well delineated by vessel-suppressed images. Three (6.4%) of the 47 lung cancer cases, and 11 (39.3%) of the 28 benign nodules were ignored and not detected by the AI without showing a CAD analysis summary. The AI system/radiologists produced a sensitivity and specificity (shown in %) of 93.6/89.4 and 39.3/82.1 in distinguishing malignant from benign nodules, respectively. AI sensitivity was higher than that of radiologists, though not statistically significant (p = 0.712). Specificity obtained by the radiologists was significantly higher than that of the VS-CAD AI (p = 0.003). There was no significant difference between the malignant and benign lesions with respect to age, gender, pure ground-glass pattern, the diameter and location of the nodules, or nodules <6 vs. ≥6 mm. However, more part-solid nodules were proven to be malignant than benign (90.9% vs. 9.1%), and more solid nodules were proven to be benign than malignant (86.7% vs. 13.3%) with statistical significance (p = 0.001 and <0.001, respectively). A larger cohort and prospective study are required to validate the AI performance.
Keyphrases