Graph CNNs with Motif and Variable Temporal Block for Skeleton-Based Action Recognition

  • Yu-Hui Wen Chinese Academy of Sciences
  • Lin Gao Chinese Academy of Sciences
  • Hongbo Fu City University of Hong Kong
  • Fang-Lue Zhang Victoria University of Wellington
  • Shihong Xia Chinese Academy of Sciences


Hierarchical structure and different semantic roles of joints in human skeleton convey important information for action recognition. Conventional graph convolution methods for modeling skeleton structure consider only physically connected neighbors of each joint, and the joints of the same type, thus failing to capture highorder information. In this work, we propose a novel model with motif-based graph convolution to encode hierarchical spatial structure, and a variable temporal dense block to exploit local temporal information over different ranges of human skeleton sequences. Moreover, we employ a non-local block to capture global dependencies of temporal domain in an attention mechanism. Our model achieves improvements over the stateof-the-art methods on two large-scale datasets.

AAAI Technical Track: Vision