结合深度残差与多特征融合的步态识别方法
罗亚波, 梁心语, 张峰, 李存荣(武汉理工大学) 摘 要
目的 步态识别是交通管理、监控安防领域的关键技术,为了解决现有步态识别算法无法充分捕捉和利用人体生物特征,在协变量干扰下模型精度降低的问题,本文提出一种深度提取与融合步态特征与身形特征的高精度步态识别方法。方法 该方法首先使用高分辨率网络(high resolution network,HRNet)提取出人体骨架关键点;以残差神经网络 (50-layer residual network,Resnet-50)为主干,利用深度残差模块的复杂特征学习能力,从骨架信息中充分提取相对稳定的身形特征与提供显性高效运动本质表达的步态特征;设计多分支特征融合模块(multi-branch feature fusion,MFF),进行不同通道间的尺寸对齐与权重优化,通过动态权重矩阵调节各分支贡献,融合为区分度更强的总体特征。结果 室内数据集采用跨视角多状态CASIA-B(Institute of Automation,Chinese Academy of Sciences)数据集,在跨视角实验中表现稳健;在多状态实验中,常规组的识别准确率为 94.52%,外套干扰组在同类算法中的识别性能最佳。在开放场景数据集中,模型同样体现出较高的泛化能力,相比于现有最新算法,本文方法的准确率提升了4.1%。结论 本文设计的步态识别方法充分利用了深度残差模块的特征提取能力与多特征融合的互补优势,面向复杂识别场景仍具有较高的模型识别精度与泛化能力。
关键词
Gait recognition combining deep residual and multi-feature fusion
luoyabo, liangxinyu, zhangfeng, licunrong(Wuhan University of Technology) Abstract
Objective Gait recognition, as a novel type of biometric identification technology, has the advantages of difficulty in disguise and long-range surveillance, making it widely utilized in criminal investigation, industries, surveillance systems and social security. However, in practical applications, due to elements like observation angles and complicated background, the detection accuracy of existing gait recognition algorithms remains susceptible to enhancement. Gait recognition methods mainly rely on feature extraction to obtain distinctive feature maps for identification,but existing gait recognition algorithms do not adequately capture and utilize human biometrics. Fortunately, the deep residual module is capable of extracting high-level feature. To achieve consistent recognition in practical applications, this research offers a high-precision gait recognition method that comprehensively extracts multi-features through deep residual module and combines gait features with body shape features. Method Using 50-layer residual network (Resnet-50) as the backbone network, this research constructs a multi-branch feature fusion network based on information from the human skeleton. For reliable and accurate gait identification, the network combines deep feature extraction and network fusion modules. High resolution network (HRNet) extracts information about the human skeleton and uses information blending across parallel networks with varying resolutions to improve the network's recognition accuracy and capacity to identify low-resolution images. The gait cycle is extracted based on the similarity of leg movement features, which acts as a hyperparameter for the network to minimize computational load while maintaining important information. Following data augmentation, feature modeling is carried out using the pre-network skeleton information. Optimal residual modules are then employed in the deep residual module for scale alignment and information transfer, allowing for the deep extraction of gait and body shape data. The datas are combined into three branches: skeletal motion, gait speed, and body proportion. To enhance the network structure and information interaction mechanism, multi-branch feature fusion (MFF) uses an information transfer and weight allocation mechanism akin to the attention mechanism. This module first concatenates the ratio and skeleton branches at the input end using matrix concatenation, merges them into spatial information, and maps the concatenated features to a low-dimensional space to produce fused features. The fused features are mapped to the spatial and velocity branches using an activation function, with the activation values acting as weight parameters on each branch. It adjusts the weights of each branch's feature maps based on their importance to the task, combining feature branches with respective recognition advantages to leverage complementary strengths. Through this step, the identity information of the target is comprehensively identified, optimizing the limitations of low discrimination and recognition accuracy. This strategy improves the network's generalizability and recognition accuracy. Result The MPⅡ human pose dataset (MPII) comprises 6619 test sets, 14679 training sets, and 2726 validation sets. The HRNet demonstrates excellent performance in point localization. For input images of 256 256 pixels with a threshold of 0.01, the average head-normalized probability of correct key point (PCKh) of each point exceeds 83%. Moreover, the keypoint localization performance for the lower extremities is good, with PCKh values greater than 95% for both the ankle and knee, meeting the precision requirements for further experiments. In the ablation experiment, three distinct walking circumstances were depicted in these sequences: normal walking(NM), walking with backpacks(BG), and coat or jacket(CL). The accuracy of gait recognition through a single feature is relatively low, especially while wearing a thick coat. By combining the three types of features for judgment, the recognition accuracy for the NM group is significantly enhanced, reaching 94.52%. Furthermore, it offers high resistance to interfering elements, with an overall recognition rate increased by 4.50% compared to the rank-2 option. To test the model's generalization capabilities in diverse angles, the cross-view Institute of Automation, Chinese Academy of Sciences (CASIA-B) dataset was chosen for training and experimentation. The initial training parameters are: batch size=64, learning rate=0.0001, learning decay rate=0.01, and dropout=0.35. It performed robustly in cross view experiments, with a better performance at 36 degrees and 126 degrees, as well as nearby intervals. In multi-state experiment, the recognition rate for the NM group is as high as 97.36%. Furthermore, the method performance best among similar algorithms for the CL group. Gait recognition technology is applicable in both indoor and outdoor contexts. Interferences in outdoor environments include lighting changes, angle adjustments, and dynamic backgrounds. The results gained from indoor datasets cannot be efficiently applied to outdoor situations. As a result, this study validates the algorithm's functionality using a self-built outdoor dataset. Most existing gait detection algorithms lose more than 15% of their accuracy when moving from indoor samples to open situations. With a 4.1% accuracy gain over the rank-2 gait detection algorithm, the approach described in this research shows strong generalization potential. Conclusion The gait recognition method presented in this study effectively utilizes the high resilience of skeleton information and the recognition benefits of multi-feature fusion. It can efficiently reduce interference under difficult settings such as complicated backgruond, thick clothes, and variable angles, resulting in stable and high-precision recognition of target identities,which fulfills the requirements of practical applications.
Keywords
biometric identification gait recognition high-resolution network fusion algorithm residual neural network
|