Login / Signup

YOLO Series for Human Hand Action Detection and Classification from Egocentric Videos.

Hung-Cuong NguyenThi-Hao NguyenRafał SchererVan-Hung Le
Published in: Sensors (Basel, Switzerland) (2023)
Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and performance of the "You Only Live Once" (YOLO) network over the past seven years, we propose a study comparing the efficiency of hand detection and classification based on the YOLO-family networks. This study is based on the following problems: (1) systematizing all architectures, advantages, and disadvantages of YOLO-family networks from version (v)1 to v7; (2) preparing ground-truth data for pre-trained models and evaluation models of hand detection and classification on EV datasets (FPHAB, HOI4D, RehabHand); (3) fine-tuning the hand detection and classification model based on the YOLO-family networks, hand detection, and classification evaluation on the EV datasets. Hand detection and classification results on the YOLOv7 network and its variations were the best across all three datasets. The results of the YOLOv7-w6 network are as follows: FPHAB is P = 97% with Thesh IOU = 0.5; HOI4D is P = 95% with Thesh IOU = 0.5; RehabHand is larger than 95% with Thesh IOU = 0.5; the processing speed of YOLOv7-w6 is 60 fps with a resolution of 1280 × 1280 pixels and that of YOLOv7 is 133 fps with a resolution of 640 × 640 pixels.
Keyphrases
  • machine learning
  • deep learning
  • loop mediated isothermal amplification
  • label free
  • real time pcr
  • mental health
  • endothelial cells
  • body composition
  • air pollution
  • quantum dots
  • single cell
  • psychometric properties