Multitask Learning with Convolutional Neural Networks and Vision Transformers Can Improve Outcome Prediction for Head and Neck Cancer Patients.

Sebastian Starke Alexander ZwanenburgKaroline LegerFabian LohausAnnett LingeGoda KalinauskaiteInge TinhoferNika GuberinaMaja Guberina Panagiotis Balermpas Jens von der GrünUte GanswindtClaus BelkaJan C PeekenStephanie E CombsSimon BoekeDaniel ZipsChristian RichterEsther G C TroostMechthild KrauseMichael BaumannSteffen Loeck

Published in: Cancers (2023)

Neural-network-based outcome predictions may enable further treatment personalization of patients with head and neck cancer. The development of neural networks can prove challenging when a limited number of cases is available. Therefore, we investigated whether multitask learning strategies, implemented through the simultaneous optimization of two distinct outcome objectives (multi-outcome) and combined with a tumor segmentation task, can lead to improved performance of convolutional neural networks (CNNs) and vision transformers (ViTs). Model training was conducted on two distinct multicenter datasets for the endpoints loco-regional control (LRC) and progression-free survival (PFS), respectively. The first dataset consisted of pre-treatment computed tomography (CT) imaging for 290 patients and the second dataset contained combined positron emission tomography (PET)/CT data of 224 patients. Discriminative performance was assessed by the concordance index (C-index). Risk stratification was evaluated using log-rank tests. Across both datasets, CNN and ViT model ensembles achieved similar results. Multitask approaches showed favorable performance in most investigations. Multi-outcome CNN models trained with segmentation loss were identified as the optimal strategy across cohorts. On the PET/CT dataset, an ensemble of multi-outcome CNNs trained with segmentation loss achieved the best discrimination (C-index: 0.29, 95% confidence interval (CI): 0.22-0.36) and successfully stratified patients into groups with low and high risk of disease progression (p=0.003). On the CT dataset, ensembles of multi-outcome CNNs and of single-outcome ViTs trained with segmentation loss performed best (C-index: 0.26 and 0.26, CI: 0.18-0.34 and 0.18-0.35, respectively), both with significant risk stratification for LRC in independent validation (p=0.002 and p=0.011). Further validation of the developed multitask-learning models is planned based on a prospective validation study, which has recently completed recruitment.

Keyphrases