Classification performance bias between training and test sets in a limited mammography dataset.

Rui HouJoseph Y LoJeffrey R MarksE Shelley HwangLars J Grimm

Published in: medRxiv : the preprint server for health sciences (2023)

In medical imaging, clinical datasets are often limited to relatively small size. Models built from different training sets may not be representative of the whole dataset. Depending on the selected data split and model, performance bias could lead to inappropriate conclusions that might influence the clinical significance of the findings. Optimal strategies for test set selection should be developed to ensure study conclusions are appropriate.

Keyphrases

virtual reality
high resolution
machine learning
deep learning
electronic health record
big data
magnetic resonance imaging
contrast enhanced
magnetic resonance
rna seq
single cell
data analysis