Enrichment of lung cancer computed tomography collections with AI-derived annotations.
Deepa KrishnaswamyDennis BontempiVamsi Krishna ThiriveedhiDavide PunzoDavid A ClunieChristopher P BridgeHugo J W L AertsRon KikinisAndrey FedorovPublished in: Scientific data (2024)
Public imaging datasets are critical for the development and evaluation of automated tools in cancer imaging. Unfortunately, many do not include annotations or image-derived features, complicating downstream analysis. Artificial intelligence-based annotation tools have been shown to achieve acceptable performance and can be used to automatically annotate large datasets. As part of the effort to enrich public data available within NCI Imaging Data Commons (IDC), here we introduce AI-generated annotations for two collections containing computed tomography images of the chest, NSCLC-Radiomics, and a subset of the National Lung Screening Trial. Using publicly available AI algorithms, we derived volumetric annotations of thoracic organs-at-risk, their corresponding radiomics features, and slice-level annotations of anatomical landmarks and regions. The resulting annotations are publicly available within IDC, where the DICOM format is used to harmonize the data and achieve FAIR (Findable, Accessible, Interoperable, Reusable) data principles. The annotations are accompanied by cloud-enabled notebooks demonstrating their use. This study reinforces the need for large, publicly accessible curated datasets and demonstrates how AI can aid in cancer imaging.
Keyphrases
- artificial intelligence
- big data
- deep learning
- machine learning
- computed tomography
- high resolution
- electronic health record
- healthcare
- small cell lung cancer
- papillary thyroid
- magnetic resonance imaging
- convolutional neural network
- lymph node metastasis
- emergency department
- rna seq
- clinical trial
- positron emission tomography
- squamous cell
- fluorescence imaging
- spinal cord injury
- mass spectrometry
- dual energy