Current Issue Cover
  • 发布时间: 2025-03-05
  • 摘要点击次数:  16
  • 全文下载次数: 5
  • DOI:
  •  | Volume  | Number
融合骨架大核算子和全局上下文信息的图卷积网络

吴志泽1, 万龙1, 洪芳华2, 汤正道3, 孙斐1, 邹乐1, 王晓峰1(1.合肥大学;2.合肥市公安局;3.安徽省产品质量监督检验研究院)

摘 要
目的 骨架数据不仅体量轻巧,而且其内在的拓扑结构与图卷积网络(graph convolution network, GCN)高度契合,基于图卷积网络的骨架人体行为识别技术在行为识别领域得到了广泛的关注。然而,传统图卷积难以有效建模远距离节点关系,从而限制了其在复杂动作识别中的表现。方法 针对这一问题,提出了一种融合骨架大核算子和上下文信息的骨架图卷积网络Skeleton Large-Kernel and Contextual GCN (SLK-GCN),该方法从两种不同的角度来实现空间特征的增强。首先设计了一种新颖的骨架大核卷积算子(skeleton-large kernel convolution, SLKC),通过扩大感受野并增强通道适应性,以增强空间特征提取能力。具体而言,SLKC通过引入大核卷积网络,模拟节点之间的远程依赖关系,从而提升模型在处理空间复杂性时的表现。同时,SLKC利用扩展的感受野来捕捉更多的全局信息,增强了特征提取的深度和广度。此外,引入了轻量级的全局上下文建模模块(global context modeling, GCM),该模块能够自动学习和适应骨架拓扑结构,并从全局视角整合上下文特征。GCM通过捕捉不同节点之间的全局关系,进一步提升了模型的表征能力和鲁棒性。结果 所提出的SLK-GCN在NTU RGB+D、NTU RGB+D120以及Northwestern-UCLA数据集上分别达到最高96.8%、91.0%以及96.8%的准确率,实验结果表明SLK-GCN在人体行为识别任务中表现出了显著的优势和有效性。结论 SLKC与GCM的引入和结合,使得SLK-GCN在处理复杂骨架数据时能够更加有效地提取和利用空间特征。
关键词
Graph Convolutional Network Integrating Skeleton Large Kernel Operators and Global Context Information

(Hefei University)

Abstract
Objective Skeleton data, owing to its lightweight nature and intrinsic topological structure, has become a prime candidate for use with Graph Convolutional Networks (GCNs). The alignment between skeleton data and GCNs has sparked significant interest in developing skeleton-based human action recognition techniques. These techniques leverage the strengths of GCNs to interpret the skeletal structures and movements inherent in human actions. However, traditional graph convolution methods face challenges in effectively modeling long-range node relationships, which are crucial for accurately recognizing complex actions. This limitation stems from the inherent design of conventional graph convolutions, which typically focus on local neighborhood information and struggle with capturing dependencies between distant nodes in the graph. Method To overcome this challenge, we propose a novel approach termed Skeleton Large-Kernel and Contextual GCN (SLK-GCN). This innovative network aims to enhance spatial features from two distinct perspectives, thereby improving the ability to model long-range dependencies and capture the complexity of human actions more effectively. The first key component of our SLK-GCN is the Skeleton-Large Kernel Convolution (SLKC) operator. This operator is designed to expand the receptive field and enhance channel adaptability, leading to improved spatial feature extraction. Traditional convolutional kernels are limited in their ability to capture extensive spatial relationships due to their relatively small receptive fields. In contrast, SLKC employs large kernel convolution networks, which significantly broaden the receptive field. This broader receptive field allows the model to simulate long-range dependencies between nodes more effectively. By doing so, SLKC enhances the model's capacity to handle the spatial complexities inherent in human action recognition tasks. The large kernel approach not only captures a wider array of spatial information but also ensures that the extracted features are more comprehensive and nuanced, contributing to better overall model performance. In addition to SLKC, we introduce a lightweight Global Context Modeling (GCM) module as the second key component of SLK-GCN. The GCM module is designed to automatically learn and adapt to the topological structure of the skeleton, integrating contextual features from a global perspective. Traditional models often fail to account for the global context, focusing instead on local node interactions. However, capturing global relationships between nodes is essential for understanding the full scope of human actions, especially those involving complex movements that span multiple joints and limbs. The GCM module addresses this gap by capturing global relationships and contextual information across the entire skeleton. This integration of global context enhances the model's representational capacity and robustness, allowing it to more accurately interpret and classify a wide range of human actions. Result To validate the effectiveness of our proposed SLK-GCN, we conducted extensive experiments on several widely-used datasets, including NTU RGB+D, NTU RGB+D120, and Northwestern-UCLA. These datasets are well-regarded in the field of human action recognition and provide a diverse set of scenarios and action types for comprehensive evaluation. The experimental results demonstrate that SLK-GCN exhibits significant advantages and effectiveness in human action recognition tasks. Specifically, the incorporation and combination of SLKC and GCM enable SLK-GCN to more effectively extract and utilize spatial features when processing complex skeleton data. This enhanced feature extraction capability translates to improved accuracy and robustness in recognizing a variety of human actions. The success of SLK-GCN can be attributed to several factors. Firstly, the SLKC operator's ability to simulate long-range dependencies ensures that the model captures the intricate spatial relationships between different parts of the skeleton. This capability is particularly important for recognizing actions that involve coordinated movements across multiple joints. Secondly, the GCM module's integration of global context provides a holistic view of the skeleton, enabling the model to consider the broader context in which individual movements occur. This holistic perspective is crucial for accurately interpreting complex actions that cannot be understood solely through local interactions. Furthermore, the combination of SLKC and GCM in SLK-GCN represents a synergistic approach to feature enhancement. While SLKC focuses on expanding the receptive field and capturing long-range dependencies, GCM complements this by integrating global contextual information. Together, these components ensure that SLK-GCN has a comprehensive understanding of both local and global spatial features, leading to superior performance in human action recognition tasks. The implications of our work extend beyond the immediate scope of skeleton-based human action recognition. The principles underlying SLK-GCN, particularly the emphasis on large receptive fields and global context modeling, can be applied to other domains where capturing complex spatial relationships is essential. For example, similar approaches could be adapted for use in gesture recognition, sign language interpretation, and even broader applications in computer vision and robotics where understanding spatial dependencies is critical. Conclusion In conclusion, the Spatial Feature Enhanced Graph Convolutional Network (SLK-GCN) represents a significant advancement in the field of human action recognition. By addressing the limitations of traditional graph convolutions in modeling long-range node relationships, SLK-GCN offers a robust solution for capturing the complexity of human actions. The innovative combination of Skeleton-Large Kernel Convolution (SLKC) and Global Context Modeling (GCM) enables SLK-GCN to effectively extract and utilize spatial features, resulting in improved accuracy and robustness. Our extensive experimental validation on multiple datasets underscores the effectiveness of this approach, highlighting its potential for broader applications in understanding and interpreting complex spatial data.
Keywords

订阅号|日报