结合聚合通道特征和双树复小波变换的手势识别
摘 要
目的 针对目前手势识别方法受环境、光线、旋转、缩放、肤色等因素的影响,导致手势识别精度下降的问题,提出一种结合聚合通道特征(ACF)的手势检测和双树复小波变换(DTCWT)的复杂背景下手势识别方法。方法 在手势图像预处理过程中引入聚合通道特征,采用Adaboost分类器和非极大值抑制算法(NMS)进行目标手势的检测;利用DTCWT对目标手势图像进行多尺度多方向分解,对高低频系数的每一块分别提取方向梯度直方图(HOG)和局部二值模式(LBP)特征;最后融合各个方向上的高低频特征并通过支持向量机(SVM)进行分类识别。结果 选取多个场景、多个对象、不同角度和距离的图像作为训练集,并标注区分前背景,对20种手势进行识别实验,并与传统的肤色检测、HOG特征手势识别、类-Hausdorff距离的手势识别算法进行了实验对比。在任意可承受范围内的光照、距离等情况下,该方法能够更准确实时地实现手势识别,平均精度达到95.1%。结论 在图像预处理的情况下,聚合通道特征的引入能够准确检测手势,同时基于DTCWT的手势图像频域特征提取和再融合的方法有效地解决了传统普通图像的单特征识别方法在光线和复杂背景下识别精度不高的问题。
关键词
Gesture recognition based on aggregate channel feature and dual-tree complex wavelet transform
Bao Wenxia, Xie Dongwen, Zhu Ming, Liang Dong(College of Electronic Information Engineering, Anhui University, Hefei 230601, China) Abstract
Objective With the continuous development of today’s society, people’s yearning for an improved life and the level of demand for material life are constantly improving. People are bringing a highly convenient lifestyle with this improved technological development. Human-computer interaction plays an increasingly important role in people and computer’s life and becomes a powerful tool for people to work, live, or play. Traditional human-computer interaction devices, such as keyboards, mouses, and touch screens, restrict people’s use and limit their imagination because they can accurately operate. Therefore, the research direction for studying gesture recognition on the basis of images or video streams is important. Gestures are more natural and flexible than traditional I/O devices, thereby rendering gesture recognition technology a major research topic. Numerous methods are used to process input images or videos through several techniques, such as machine learning and image processing, for achieving real-time gesture interaction. This method is a research development in the computer vision field. The categories corresponding to the gestures are analyzed by detecting the hand feature information of these objects in the extracted image or video stream, thereby providing corresponding technical support for these fields. In some cases, the human body background in the scene is complex and diverse. The image light, distance, and angle of the hand introduced into the camera are diverse due to human arbitrariness. Thus, the study of gesture recognition in complex environment has become highly important. The current gesture recognition method is affected by the environment, light, rotation, zoom, and skin color, resulting in low accuracy and speed of gesture recognition. Thus, a gesture detection and dual-tree complex wavelet transform (DTCWT) combined with aggregate channel feature (ACF) is proposed to solve such problem. A gesture recognition method is used in complex background with complicated frequency domain feature extraction. The aggregation channel feature includes 10 image channels, and the pixel features of each channel are processed, filtered, and fused to obtain an ACF. Method During gesture image preprocessing, a gesture target detection method using multi-channel feature fusion is introduced as the basic process of gesture recognition. Adaboost classifier and non-maximum suppression algorithm are used to detect target gestures. DTCWT processing is performed on the target gesture image intercepted after the target detection. Multiscale multi-directional decomposition is performed to obtain high and low frequency coefficients. Gradient histogram (HOG) and local binary pattern (LBP) features are extracted for each block of high and low frequency coefficients, respectively. Finally, the features of high-low frequency fusion are classified and identified by the support vector machine training model. Therefore, the identification problem is divided into two stages. The first stage detects the target area and deletes the background area, which significantly improves the efficiency of gesture recognition and paves the way for accurate classification in the second stage. Result Images of multiple scenes and objects and different angles and distances were selected as the training set, and the front background was distinguished. A total of 20 types of gestures were identified and compared with traditional skin color detection, HOG feature gesture recognition, and class-Hausdorff distance. The gesture recognition algorithm was experimentally compared. For the illumination and distance in any acceptable range, the method can accurately realize gesture recognition in real time, and the average precision reaches 95.1%. Conclusion This algorithm exhibits three advantages. First, the introduced gesture target detection algorithm enables accurate positioning and interception of the hand region even in the case of skin color interference in a complex background. Normalization to a fixed size can solve the problem caused by the gesture occurrence scaling. Second, DTCWT is used to extract the high and low frequency coefficients of the image in the frequency domain and calculate the features on the high and low frequencies, respectively. The influence of light and rotation is eliminated by extracting signal features of different components, decreasing redundancy and feature dimensions, and improving the efficiency of extracting features. Third, DTCWT demonstrates several characteristics, namely, translation invariance, direction selectivity, and a small amount of redundancy. This method exhibits fast calculation speed and less memory, which can effectively achieve real-time purposes. When the gesture area is accurately detected, the proposed algorithm can achieve satisfactory results. In future research work, we will further improve the accuracy of hand detection and classification recognition. The deep learning neural network is used to identify additional datasets and gesture types for solving the small factors that may cause misidentification, obtaining high gesture recognition efficiency, and making gesture recognition highly practical.
Keywords
aggregate channel feature dual-tree complex wavelet transform (DTCWT) histogram of oriented gradient (HOG) features local binary pattern (LBP) features feature fusion support vector machine (SVM)
|