Impact of Gold-Standard Label Errors on Evaluating Performance of Deep Learning Models in Diabetic Retinopathy Screening: Nationwide Real-World Validation Study.
Yueye WangXiaotong HanCong LiLixia LuoQiuxia YinJian ZhangGuankai PengDanli ShiMingguang HePublished in: Journal of medical Internet research (2024)
Label errors based on human image grading, although in a small percentage, can significantly affect the performance evaluation of DL algorithms in real-world DR screening.