Current Issue Cover
动作切分和流形度量学习的视频动作识别

罗会兰1, 赖泽云1, 孔繁胜2(1.江西理工大学信息工程学院, 赣州 341000;2.浙江大学计算机科学技术学院, 杭州 310027)

摘 要
目的 为了提高视频中动作识别的准确度,提出基于动作切分和流形度量学习的视频动作识别算法。方法 首先利用基于人物肢体伸展程度分析的动作切分方法对视频中的动作进行切分,将动作识别的对象具体化;然后从动作片段中提取归一化之后的全局时域特征和空域特征、光流特征、帧内的局部旋度特征和散度特征,构造一种7×7的协方差矩阵描述子对提取出的多种特征进行融合;最后结合流形度量学习方法有监督式地寻找更优的距离度量算法提高动作的识别分类效果。结果 对Weizmann公共视频集的切分实验统计结果表明本文提出的视频切分方法具有很好的切分能力,能够作好动作识别前的预处理;在Weizmann公共视频数据集上进行了流形度量学习前后的识别效果对比,结果表明利用流形度量学习方法对动作识别效果提升2.8%;在Weizmann和KTH两个公共视频数据集上的平均识别率分别为95.6%和92.3%,与现有方法的比较表明,本文提出的动作识别方法有更好的识别效果。结论 多次实验结果表明本文算法在预处理过程中动作切分效果理想,描述动作所构造协方差矩阵对动作的表达有良好的多特征融合能力,而且光流信息和旋度、散度信息的加入使得人体各部位的运动方向信息具有了更多细节的描述,有效提高了协方差矩阵的描述能力,结合流形度量学习方法对动作识别的准确性有明显提高。
关键词
Action recognition in videos based on action segmentation and manifold metric learning

Luo Huilan1, Lai Zeyun1, Kong Fansheng2(1.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China;2.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China)

Abstract
Objective A video action recognition algorithm based on action segmentation and manifold metric learning is proposed to improve the accuracy of action recognition in videos.Method First, a video action segmentation algorithm based on analyzing the spreading area of actors' limbs is proposed to divide the video into segments that contain a specific action.The segmentation operation is used to recognize an action in the video quickly and reduce the mutual interference between adjacent actions.A silhouette of the actor in a frame is extracted using background subtraction method.Bounding boxes are generated in terms of the silhouettes.Given that silhouette extraction is affected by the background, the area function of the bounding boxes contains some noise, which can damage the regularity of the area function.After calculating the area value of the bounding box for each frame, the area function is smoothed using a robust weighted smooth method.Then, after extracting all the local minimum points of the smoothed area function, the second filter is used to remove fake local optimal points.After two filtering operations, the remaining minimum points are used as the segmentation position in the videos.Subsequently, the action recognition algorithm is independently implemented on each segment.For feature extraction and description of each segment, the Lucas-Kanade optical flow field is initially computed to obtain the velocity information of pixels for each frame in the segment.The pixels with non-zero magnitude of optical flow are considered as the interest points.Intraframe local curl and divergence, which is derived from the Lucas-Kanade optical flow field, are used to describe the motion relationship between interest points in the frame.A covariance matrix is formed for each action segment to fuse the features, including normalized global temporal features, normalized spatial features, optical flow, intraframe local curl, and divergence.The size of the final covariance is 7×7.Thus, the dimension of the feature covariance is relatively low.In this feature space, the action segment videos form a manifold.Several methods that measure the distance in the manifold space have been proposed.Generally, the distance between two points in a manifold space is the geodesic distance between them.In this study, a distance measurement method, which is obtained by supervised manifold metric learning, is proposed to further improve the accuracy of action classification.The LogDet divergence is utilized, and the action class labels are used to construct a constraint.A tangent space transfer matrix is obtained using the manifold metric learning.The tangent space transfer matrix leads distance calculation into a tangent space of a new latent manifold.Finally, the nearest neighbor classification method is used to recognize the actions.Result The three parts of the experiment are as follows.First, the efficiency of the action segmentation algorithm is evaluated on the Weizmann public video dataset.The results show that the proposed action segmentation method has acceptable segmentation capability.Second, the action reorganization comparison between with and without manifold metric learning on Weizmann dataset is performed to show the manifold metric learning performance.The action recognition accuracy without and with manifold metric learning is 92.8% and 95.6%, respectively, which indicates an improvement by 2.8%.Finally, the experimental results on KTH public video dataset verify the robustness of the proposed action recognition algorithm.The average recognition accuracy on KTH is 92.3%.On Weizmann and KTH datasets, the experimental comparisons indicated that the proposed algorithm is better than some state-of-the-art methods.Conclusion The proposed action segmentation method based on analyzing the spreading area of actors' limbs can segment actions at the frame, where the limbs are closest to the body.Smoothing and the second filter step on the area function of the human bounding box enhance the action segmentation ability by anti-jamming.The segmentation method can obtain a desirable pre-processing effect.The multiple features fused effectively by the covariance matrix can describe the video action appropriately.The representation capability of the covariance matrix descriptor is further improved by adding optical flow, curl, and divergence information, which describe the motion direction information of the body parts of the body in detail.Evidently, the action recognition accuracy has been improved by using the manifold metric learning.The performance of the proposed action algorithm has been improved further by adding class-label information during the metric learning.All the experimental results show that the proposed video action recognition algorithm has high accuracy and desirable robustness.
Keywords

订阅号|日报