结合像素模式和特征点模式的实时表情识别
摘 要
目的 目前2D表情识别方法对于一些混淆性较高的表情识别率不高并且容易受到人脸姿态、光照变化的影响,利用RGBD摄像头Kinect获取人脸3D特征点数据,提出了一种结合像素2D特征和特征点3D特征的实时表情识别方法。方法 首先,利用3种经典的LBP(局部二值模式)、Gabor滤波器、HOG(方向梯度直方图)提取了人脸表情2D像素特征,由于2D像素特征对于人脸表情描述能力的局限性,进一步提取了人脸特征点之间的角度、距离、法向量3种3D表情特征,以对不同表情的变化情况进行更加细致地描述。为了提高算法对混淆性高的表情识别能力并增加鲁棒性,将2D像素特征和3D特征点特征分别训练了3组随机森林模型,通过对6组随机森林分类器的分类结果加权组合,得到最终的表情类别。结果 在3D表情数据集Face3D上验证算法对9种不同表情的识别效果,结果表明结合2D像素特征和3D特征点特征的方法有利于表情的识别,平均识别率达到了84.7%,高出近几年提出的最优方法4.5%,而且相比单独地2D、3D融合特征,平均识别率分别提高了3.0%和5.8%,同时对于混淆性较强的愤怒、悲伤、害怕等表情识别率均高于80%,实时性也达到了10~15帧/s。结论 该方法结合表情图像的2D像素特征和3D特征点特征,提高了算法对于人脸表情变化的描述能力,而且针对混淆性较强的表情分类,对多组随机森林分类器的分类结果加权平均,有效地降低了混淆性表情之间的干扰,提高了算法的鲁棒性。实验结果表明了该方法相比普通的2D特征、3D特征等对于表情的识别不仅具有一定的优越性,同时还能保证算法的实时性。
关键词
Real-time expression recognition method based on pixel and feature point patterns
Liang Huagang, Yi Sheng, Ru Feng(School of Electronics and Control Engineering, Chang'an University, Xi'an 710064, China) Abstract
Objective Facial expression recognition (FER) is a major research topic in pattern recognition and computer vision and presents a wide application prospect.At present, the method of FER usually focuses on extracting features from 2D face images and analyzing the feature of local facial texture and contours to recognize facial expressions. Given the complexity and subtlety of facial expressions, distinguishing facial expression accurately only using 2D features extracted from 2D face images is difficult and the recognition effect decreases drastically when processing non-database images or face poses and ambient light changes.Traditional 2D facial expression recognition methods are easily influencedby various factors, such as posture and illumination, and cannot effectively recognize a few confusing expressions. In this study, a method that combines 2D pixel features (2D PF) and 3D feature point features (3D FPF)based on Kinect sensor is proposed to achieve robust real-time FER and thus overcome the above-mentioned disadvantages of previous methods. Method First, 3D data of facial feature points are obtained with Kinect. The face image is segmented by the enclosing rectangle around the area of the eyebrows and mouth. As a result, the segmented face image does not contain the background part and the block of the forehead and chin; other irrelevant areas that do not reflect the expression changes are excluded as well. Then, the classic LBP,Gabor, and HOG operators are used to extract 2D PF from the segmented face images. The computation of LBP, Gabor, and HOG feature extraction is generally relatively complex, thereby hindering the real-time operation of the algorithm. Accordingly,proper adjustments on the process of extracting LBP, HOG, and Gabor features are performed to reduce the computation cost.The eigenvectors are also reduced dimensionally to ensure the real-time performance of the algorithm. However, 2D PF presents difficulty in describing the feature changes in facial expression and is sensitive to various extraneous factors. Thus,three types of 3D features of angle,distance, and normal vector are proposed to describe the deformation of face in detail and thus improve the recognition effects on confusing expressions.The facial expression information mainly includes that of the eyebrows, eyes, mouth, and other local areas. Interferences can be reduced and the efficiency of feature extraction can be improved by excluding a few feature points that are unrelated to facial expression. Thus, the 3D features of angular, distance, and plane normal vectors between the connection line of different feature points in eyebrows and mouth area are selected as feature vectors in describing the changes in facial expression. However, the small number of feature points in eyebrows and mouth area and the low precision of 3D data acquired with Kinect result in poor recognition effects.Accordingly, 2D PF and 3D FPF are integrated to complete the recognition task and thus ensure the balance between the accuracy of expression recognition and real-time performance. Finally, three sets of random forest models are trained by 2D PF and 3D FPF, and the weighting factors of six sets of random forest classifiers are assigned using the feature dimension size to mitigate the influence of the difference between 2D PF and 3D FPF.The final classification results are decided by weighting the average of six sets of random forest to improve the robustness and recognition capability of the algorithm. Result The effect of the algorithm is verified by recognizing 9 different expressions including calmness, smile, laugh, surprise, fear, anger, sadness, meditation and disgust on the 3D expression database called Face 3D that contains 9 types of facial expressions of 10 individuals, with a total of 9 000 sets of images and feature point data. Experimental results show that the combination of 2D PF and 3D FPF is conducive to discriminating facial expressions. The average recognition rate of 9 expressions on the Face 3D database is 84.7%, which is 4.5% higher than that of the best method proposed in recent years that only combines 2D high-dimensional HOG feature and 3D angle feature. The average recognition rate is 3.0% and 5.8% higher than the rates obtained by the use of 2D or 3D fusion features alone. The recognition rate can reach more than 80% for a few confusing expressions, such as anger, sadness, and fear, and the real-time performance can realize 10 to 15 frames per second owing to the high-speed data acquisition with Kinect. Conclusion The proposed expression recognition method can improve the expression feature describing capability of the algorithm by combining 2D PF and 3D FPF and can effectively reduce the interference between confusing expressions and enhance the robustness of the algorithm by use of the weighting average of random forest classification. The proposed method is more beneficial to the recognition of facial expression than ordinary 2D or 3D features and can guarantee insignificant decrease in real-time performance.
Keywords
multi-feature extraction real-time facial expression recognition random forest Kinect depth sensor multi-expression classification
|