Integration of improved YOLOv5 for face mask detector and auto-labeling to generate dataset for fighting against COVID-19.

Thi-Ngot PhamViet-Hoan NguyenJun-Ho Huh

Published in: The Journal of supercomputing (2023)

One of the most effective deterrent methods is using face masks to prevent the spread of the virus during the COVID-19 pandemic. Deep learning face mask detection networks have been implemented into COVID-19 monitoring systems to provide effective supervision for public areas. However, previous works have limitations: the challenge of real-time performance (i.e., fast inference and low accuracy) and training datasets. The current study aims to propose a comprehensive solution by creating a new face mask dataset and improving the YOLOv5 baseline to balance accuracy and detection time. Particularly, we improve YOLOv5 by adding coordinate attention (CA) module into the baseline backbone following two different schemes, namely YOLOv5s-CA and YOLOV5s-C3CA. In detail, we train three models with a Kaggle dataset of 853 images consisting of three categories: without a mask "NM," with mask "M," and incorrectly worn mask "IWM" classes. The experimental results show that our modified YOLOv5 with CA module achieves the highest accuracy mAP@0.5 of 93.9% compared with 87% of baseline and detection time per image of 8.0 ms (125 FPS). In addition, we build an integrated system of improved YOLOv5-CA and auto-labeling module to create a new face mask dataset of 7110 images with more than 3500 labels for three categories from YouTube videos. Our proposed YOLOv5-CA and the state-of-the-art detection models (i.e., YOLOX, YOLOv6, and YOLOv7) are trained on our 7110 images dataset. In our dataset, the YOLOv5-CA performance enhances with mAP@0.5 of 96.8%. The results indicate the enhancement of the improved YOLOv5-CA model compared with several state-of-the-art works.

Keyphrases