Login / Signup

Localization and recognition of human action in 3D using transformers.

Jiankai SunLinjiang HuangHongsong WangChuanyang ZhengJianing QiuMd Tauhidul IslamEnze XieBolei ZhouLei XingArjun ChandrasekaranMichael J Black
Published in: Communications engineering (2024)
Understanding a person's behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare.
Keyphrases
  • endothelial cells
  • healthcare
  • induced pluripotent stem cells
  • pluripotent stem cells
  • mental health
  • machine learning
  • social media
  • computed tomography
  • amino acid
  • artificial intelligence
  • pet ct
  • human health