The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images.
Federica CorsoGiulia TiniGiuliana Lo PrestiNoemi GarauSimone Pietro De AngelisFederica BellerbaLisa RinaldiFrancesca BottaStefania Maria Rita RizzoDaniela OriggiChiara PaganelliMarta CremonesiCristiano RampinelliMassimo BellomiLuca MazzarellaPier Giuseppe PelicciSara GandiniSara RaimondiPublished in: Cancers (2021)
Radiomics uses high-dimensional sets of imaging features to predict biological characteristics of tumors and clinical outcomes. The choice of the algorithm used to analyze radiomic features and perform predictions has a high impact on the results, thus the identification of adequate machine learning methods for radiomic applications is crucial. In this study we aim to identify suitable approaches of analysis for radiomic-based binary predictions, according to sample size, outcome balancing and the features-outcome association strength. Simulated data were obtained reproducing the correlation structure among 168 radiomic features extracted from Computed Tomography images of 270 Non-Small-Cell Lung Cancer (NSCLC) patients and the associated to lymph node status. Performances of six classifiers combined with six feature selection (FS) methods were assessed on the simulated data using AUC (Area Under the Receiver Operating Characteristics Curves), sensitivity, and specificity. For all the FS methods and regardless of the association strength, the tree-based classifiers Random Forest and Extreme Gradient Boosting obtained good performances (AUC ≥ 0.73), showing the best trade-off between sensitivity and specificity. On small samples, performances were generally lower than in large-medium samples and with larger variations. FS methods generally did not improve performances. Thus, in radiomic studies, we suggest evaluating the choice of FS and classifiers, considering specific sample size, balancing, and association strength.
Keyphrases
- machine learning
- deep learning
- computed tomography
- lymph node
- big data
- end stage renal disease
- convolutional neural network
- artificial intelligence
- electronic health record
- climate change
- newly diagnosed
- small cell lung cancer
- magnetic resonance imaging
- ejection fraction
- positron emission tomography
- contrast enhanced
- image quality
- chronic kidney disease
- dual energy
- radiation therapy
- peritoneal dialysis
- patient reported outcomes
- mass spectrometry
- lymph node metastasis
- structural basis
- fluorescence imaging