Are We There Yet? The Value of Deep Learning in a Multicenter Setting for Response Prediction of Locally Advanced Rectal Cancer to Neoadjuvant Chemoradiotherapy.
Barbara Daria WichtmannSteffen AlbertWenzhao ZhaoAngelika SchmittClaus RödelRalf-Dieter HofheinzJürgen HesserFrank G ZöllnerUlrike I AttenbergerPublished in: Diagnostics (Basel, Switzerland) (2022)
This retrospective study aims to evaluate the generalizability of a promising state-of-the-art multitask deep learning (DL) model for predicting the response of locally advanced rectal cancer (LARC) to neoadjuvant chemoradiotherapy (nCRT) using a multicenter dataset. To this end, we retrained and validated a Siamese network with two U-Nets joined at multiple layers using pre- and post-therapeutic T2-weighted (T2w), diffusion-weighted (DW) images and apparent diffusion coefficient (ADC) maps of 83 LARC patients acquired under study conditions at four different medical centers. To assess the predictive performance of the model, the trained network was then applied to an external clinical routine dataset of 46 LARC patients imaged without study conditions. The training and test datasets differed significantly in terms of their composition, e.g., T-/N-staging, the time interval between initial staging/nCRT/re-staging and surgery, as well as with respect to acquisition parameters, such as resolution, echo/repetition time, flip angle and field strength. We found that even after dedicated data pre-processing, the predictive performance dropped significantly in this multicenter setting compared to a previously published single- or two-center setting. Testing the network on the external clinical routine dataset yielded an area under the receiver operating characteristic curve of 0.54 (95% confidence interval [CI]: 0.41, 0.65), when using only pre- and post-therapeutic T2w images as input, and 0.60 (95% CI: 0.48, 0.71), when using the combination of pre- and post-therapeutic T2w, DW images, and ADC maps as input. Our study highlights the importance of data quality and harmonization in clinical trials using machine learning. Only in a joint, cross-center effort, involving a multidisciplinary team can we generate large enough curated and annotated datasets and develop the necessary pre-processing pipelines for data harmonization to successfully apply DL models clinically.
Keyphrases
- rectal cancer
- locally advanced
- deep learning
- neoadjuvant chemotherapy
- squamous cell carcinoma
- diffusion weighted
- phase ii study
- radiation therapy
- lymph node
- convolutional neural network
- end stage renal disease
- clinical trial
- diffusion weighted imaging
- ejection fraction
- contrast enhanced
- newly diagnosed
- artificial intelligence
- electronic health record
- magnetic resonance
- big data
- prognostic factors
- healthcare
- clinical practice
- double blind
- mass spectrometry
- peritoneal dialysis
- patient reported outcomes
- randomized controlled trial
- quality improvement
- coronary artery disease
- atrial fibrillation
- optical coherence tomography
- rna seq
- resistance training
- open label