Letter to the Editor Regarding Article "Prior to Initiation of Chemotherapy, Can We Predict Breast Tumor Response? Deep Learning Convolutional Neural Networks Approach Using a Breast MRI Tumor Dataset".

Published in: Journal of imaging informatics in medicine (2024)

The cited article reports on a convolutional neural network trained to predict response to neoadjuvant chemotherapy from pre-treatment breast MRI scans. The proposed algorithm attains impressive performance on the test dataset with a mean Area Under the Receiver-Operating Characteristic curve of 0.98 and a mean accuracy of 88%. In this letter, I raise concerns that the reported results can be explained by inadvertent data leakage between training and test datasets. More precisely, I conjecture that the random split of the full dataset in training and test sets did not occur on a patient level, but rather on the level of 2D MRI slices. This allows the neural network to "memorize" a patient's anatomy and their treatment outcome, as opposed to discovering useful features for treatment response prediction. To provide evidence for these claims, I present results of similar experiments I conducted on a public breast MRI dataset, where I demonstrate that the suspected data leakage mechanism closely reproduces the results reported on in the cited work.

Keyphrases