The impact of training sample size on deep learning-based organ auto-segmentation for head-and-neck patients.
Yingtao FangJiazhou WangXiaomin OuHongmei YingChaosu HuZhen ZhangWeigang HuPublished in: Physics in medicine and biology (2021)
To investigate the impact of training sample size on the performance of deep learning-based organ auto-segmentation for head-and-neck cancer patients, a total of 1160 patients with head-and-neck cancer who received radiotherapy were enrolled in this study. Patient planning CT images and regions of interest (ROIs) delineation, including the brainstem, spinal cord, eyes, lenses, optic nerves, temporal lobes, parotids, larynx and body, were collected. An evaluation dataset with 200 patients were randomly selected and combined with Dice similarity index to evaluate the model performances. Eleven training datasets with different sample sizes were randomly selected from the remaining 960 patients to form auto-segmentation models. All models used the same data augmentation methods, network structures and training hyperparameters. A performance estimation model of the training sample size based on the inverse power law function was established. Different performance change patterns were found for different organs. Six organs had the best performance with 800 training samples and others achieved their best performance with 600 training samples or 400 samples. The benefit of increasing the size of the training dataset gradually decreased. Compared to the best performance, optic nerves and lenses reached 95% of their best effect at 200, and the other organs reached 95% of their best effect at 40. For the fitting effect of the inverse power law function, the fitted root mean square errors of all ROIs were less than 0.03 (left eye: 0.024, others: <0.01), and theRsquare of all ROIs except for the body was greater than 0.5. The sample size has a significant impact on the performance of deep learning-based auto-segmentation. The relationship between sample size and performance depends on the inherent characteristics of the organ. In some cases, relatively small samples can achieve satisfactory performance.
Keyphrases
- deep learning
- end stage renal disease
- ejection fraction
- convolutional neural network
- newly diagnosed
- chronic kidney disease
- prognostic factors
- virtual reality
- early stage
- computed tomography
- magnetic resonance
- high resolution
- optical coherence tomography
- peritoneal dialysis
- emergency department
- artificial intelligence
- squamous cell carcinoma
- spinal cord injury
- case report
- patient safety
- single cell
- drug induced