A Novel Data Augmentation Method for Radiomics Analysis Using Image Perturbations.
F Lo IaconoRiccardo MaragnaGianluca PontoneValentina D A CorinoPublished in: Journal of imaging informatics in medicine (2024)
Radiomics extracts hundreds of features from medical images to quantitively characterize a region of interest (ROI). When applying radiomics, imbalanced or small dataset issues are commonly addressed using under or over-sampling, the latter being applied directly to the extracted features. Aim of this study is to propose a novel balancing and data augmentation technique by applying perturbations (erosion, dilation, contour randomization) to the ROI in cardiac computed tomography images. From the perturbed ROIs, radiomic features are extracted, thus creating additional samples. This approach was tested addressing the clinical problem of distinguishing cardiac amyloidosis (CA) from aortic stenosis (AS) and hypertrophic cardiomyopathy (HCM). Twenty-one CA, thirty-two AS and twenty-one HCM patients were included in the study. From each original and perturbed ROI, 107 radiomic features were extracted. The CA-AS dataset was balanced using the perturbation-based method along with random over-sampling, adaptive synthetic (ADASYN) and the synthetic minority oversampling technique (SMOTE). The same methods were tested to perform data augmentation dealing with CA and HCM. Features were submitted to robustness, redundancy, and relevance analysis testing five feature selection methods (p-value, least absolute shrinkage and selection operator (LASSO), semi-supervised LASSO, principal component analysis (PCA), semi-supervised PCA). Support vector machine performed the classification tasks, and its performance were evaluated by means of a 10-fold cross-validation. The perturbation-based approach provided the best performances in terms of f1 score and balanced accuracy in both CA-AS (f1 score: 80%, AUC: 0.91) and CA-HCM (f1 score: 86%, AUC: 0.92) classifications. These results suggest that ROI perturbations represent a powerful approach to address both data balancing and augmentation issues.
Keyphrases
- hypertrophic cardiomyopathy
- left ventricular
- deep learning
- aortic stenosis
- machine learning
- ejection fraction
- electronic health record
- big data
- computed tomography
- protein kinase
- lymph node metastasis
- heart failure
- healthcare
- transcatheter aortic valve replacement
- soft tissue
- contrast enhanced
- end stage renal disease
- aortic valve replacement
- newly diagnosed
- magnetic resonance imaging
- convolutional neural network
- artificial intelligence
- optical coherence tomography
- chronic kidney disease
- squamous cell carcinoma
- transcatheter aortic valve implantation
- positron emission tomography
- data analysis
- coronary artery disease
- working memory
- magnetic resonance
- atrial fibrillation
- patient reported outcomes