Joint Dynamic Pose Image and Space Time Reversal for Human Action Recognition from Videos

Mengyuan Liu; Fanyang Meng; Chen Chen; Songtao Wu

doi:10.1609/aaai.v33i01.33018762

Authors

Mengyuan Liu Nanyang Technological University
Fanyang Meng Peking University
Chen Chen University of North Carolina at Charlotte
Songtao Wu Shenzhen University

DOI:

https://doi.org/10.1609/aaai.v33i01.33018762

Abstract

Human action recognition aims to classify a given video according to which type of action it contains. Disturbance brought by clutter background and unrelated motions makes the task challenging for video frame-based methods. To solve this problem, this paper takes advantage of pose estimation to enhance the performances of video frame features. First, we present a pose feature called dynamic pose image (DPI), which describes human action as the aggregation of a sequence of joint estimation maps. Different from traditional pose features using sole joints, DPI suffers less from disturbance and provides richer information about human body shape and movements. Second, we present attention-based dynamic texture images (att-DTIs) as pose-guided video frame feature. Specifically, a video is treated as a space-time volume, and DTIs are obtained by observing the volume from different views. To alleviate the effect of disturbance on DTIs, we accumulate joint estimation maps as attention map, and extend DTIs to attention-based DTIs (att-DTIs). Finally, we fuse DPI and att-DTIs with multi-stream deep neural networks and late fusion scheme for action recognition. Experiments on NTU RGB+D, UTD-MHAD, and Penn-Action datasets show the effectiveness of DPI and att-DTIs, as well as the complementary property between them.

Joint Dynamic Pose Image and Space Time Reversal for Human Action Recognition from Videos

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription