Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population.

Ahmed MaiterKatherine HockingSuzanne MatthewsJonathan TaylorMichael SharkeyPeter MetherallSamer AlabedKrit DwivediYousef ShahinElizabeth AndersonSarah HoltCharlotte RowbothamMohamed A KamilNigel HoggardSaba P BalasubramanianAndrew SwiftChristopher S Johns

Published in: BMJ open (2023)

The software demonstrated considerable underperformance in this real-world patient cohort. Failure analysis suggested a lack of generalisability in the training and testing datasets as a potential factor. The low PPV carries the risk of over-investigation and limits the translation of the software to clinical practice. Our findings highlight the importance of training and testing software in representative datasets, with broader implications for the implementation of AI tools in imaging.

Keyphrases

artificial intelligence
machine learning
big data
data analysis
clinical practice
deep learning
primary care
healthcare
cross sectional
virtual reality
case report
single cell