Current Issue Cover
自适应骨骼中心的人体行为识别算法

冉宪宇, 刘凯, 李光, 丁文文, 陈斌(西安电子科技大学计算机学院, 西安 710071)

摘 要
目的 基于3维骨架的行为识别研究在计算机视觉领域一直是非常活跃的主题,在监控、视频游戏、机器人、人机交互、医疗保健等领域已取得了非常多的成果。现今的行为识别算法大多选择固定关节点作为坐标中心,导致动作识别率较低,为解决动作行为识别中识别精度低的问题,提出一种自适应骨骼中心的人体行为识别的算法。方法 该算法首先从骨骼数据集中获取三维骨架序列,并对其进行预处理,得到动作的原始坐标矩阵;再根据原始坐标矩阵提取特征,依据特征值的变化自适应地选择坐标中心,重新对原始坐标矩阵进行归一化;最后通过动态时间规划方法对动作坐标矩阵进行降噪处理,借助傅里叶时间金字塔表示的方法减少动作坐标矩阵时间错位和噪声问题,再使用支持向量机对动作坐标矩阵进行分类。论文使用国际上通用的数据集UTKinect-Action和MSRAction3D对算法进行验证。结果 结果表明,在UTKinect-Action数据集上,该算法的行为识别率比HO3D J2算法高4.28%,比CRF算法高3.48%。在MSRAction3D数据集上,该算法比HOJ3D算法高9.57%,比Profile HMM算法高2.07%,比Eigenjoints算法高6.17%。结论 本文针对现今行为识别算法的识别率低问题,探究出问题的原因是采用了固定关节坐标中心,提出了自适应骨骼中心的行为识别算法。经仿真验证,该算法能有效提高人体行为识别的精度。
关键词
Human action recognition algorithm based on adaptive skeleton center

Ran Xianyu, Liu Kai, Li Guang, Ding Wenwen, Chen Bin(Computer Science and Technology, Xidian University, Xi'an 710071, China)

Abstract
Objective Human action recognition based on 3D skeleton has been a popular topic in computer vision, the goal of which is to automatically segment, capture, and recognize human action. Human action recognition has been widely applied in real-world applications. For the past several decades, it has been used in surveillance, video games, robotics, human-human interaction, human-computer interaction, and health care, and has been widely explored by researchers since the 1960s. This study obtains 3D data in four ways. First, a motion capture system is used based on a marker. Second, multiple views are used for 2D image sequence reconstruction of 3D information. Third, range sensors are used. Fourth, RGB videos are used. However, extracting data by using a motion capture system and reconstruction is inconvenient. Range sensors are expensive and difficult to use in a human environment, and they obtain data slowly and provide a poorly estimated distance. Moreover, RGB images usually provide the appearance information of the objects in the scene. Given the limited information provided by RGB images, solving certain problems, such as the partition of the foreground and background with similar colors and textures, is difficult, if not impossible. Moreover, RGB data are highly sensitive to various factors, such as illumination, viewpoint, occlusions, clutter, or diversity of datasets. RGB video sensor data cannot capture the information that human needs. The rapid development of depth sensors, such as 3D Microsoft Kinect sensor, in recent years has provided not only color image data but also 3D depth image information. Three-dimensional depth images record the distance between object and body, thereby producing considerable information. Real-time skeletal-tracking technique and support vector machine recognize various postures and extract key information. The investigation of computer vision algorithms based on 3D skeleton algorithms has thus attracted significant attention in the last few years. Many researchers have been studying skeleton-based algorithms, which have presented numerous achievements and contributions. The present action recognition algorithm selects a fixed joint as the coordinate center, which leads to a low recognition rate. An adaptive skeleton center algorithm for human action recognition is proposed to solve the problem of low accuracy. Method In the algorithm, frames of skeleton action sequences are loaded onto a human action dataset, redundant frames are removed from the sequence frame information, and the original coordinate matrix is obtained by preprocessing the sequences. Rigid vector and joint angle features are generated by extracting the original coordinate matrix. The adaptive value can be determined on the basis of changes in rigid vector and joint angle values. The coordinate center can be adaptively selected according to the adaptive value and used to renormalize the original matrix. The action coordinate matrix is denoised by using a dynamic time-planning method. The Fourier time pyramid method is used to reduce the time displacement and noise problems of the action coordinate matrix. The matrix is classified by using support vector machine. Result Unlike existing algorithms, such as histogram of 3D joint (HO3DJ), conditional random field (CRF), EigenJoints, profile hidden Markov model (HMM), relation matrix of 3D rigid bodies+principal geodesic distance, and actionlet algorithms, the proposed algorithm exhibits improved performances on different datasets. On the UTKinect dataset, the action recognition rate of the proposed algorithm is 4.28% higher than that of the HO3DJ algorithm and 3.48% higher than that of the CRF algorithm. On the MSRAction3D dataset, the action recognition rate of the proposed algorithm is 9.57% higher than that of the HO3DJ algorithm, 2.07% higher than that of the profile HMM algorithm, and 6.17% higher than that of the EigenJoints algorithm. Action Set (AS)1, AS2, and AS3 are subsets of the MSRAction3D dataset. The action recognition rate of the proposed algorithm is not as good as that of the other algorithms on the AS2 dataset, but the action recognition rates of the proposed algorithm are high on the AS1 and AS3 datasets. Conclusion The proposed algorithm solves the low accuracy problem of the existing action recognition algorithm. The coordinate center of a fixed joint is adopted. Simulation results show that the proposed algorithm can effectively improve the accuracy of human action recognition, and its action recognition rate is higher than those of existing algorithms. On the UTKinect dataset, the recognition rate of the proposed algorithm is at least 3% higher than those of other algorithms, and the generated single-action recognition rate is as high as 90%. On the MSRAction3D dataset, the proposed algorithm shows advantages on AS1 and AS2 datasets, but its recognition rate on AS2 is not ideal, particularly in the recognition of the upper limb. Therefore, this algorithm needs improvement. The algorithm is generally efficient for single-action recognition. The next research direction is complex action recognition.
Keywords

订阅号|日报