两层级联卷积神经网络的人脸检测
摘 要
目的 传统人脸检测方法因人脸多姿态变化和人脸面部特征不完整等问题,导致检测效果不佳。为解决上述问题,提出一种两层级联卷积神经网络(TC_CNN)人脸检测方法。方法 首先,构建两层卷积神经网络模型,利用前端卷积神经网络模型对人脸图像进行特征粗略提取,再利用最大值池化方法对粗提取得到的人脸特征进行降维操作,输出多个疑似人脸窗口;其次,将前端粗提取得到的人脸窗口作为后端卷积神经网络模型的输入进行特征精细提取,并通过池化操作得到新的特征图;最后,通过全连接层判别输出最佳检测窗口,完成人脸检测全过程。结果 实验选取FDDB人脸检测数据集中包含人脸多姿态变化以及人脸面部特征信息不完整等情况的图像进行测试,TC_CNN方法人脸检测率达到96.39%,误检率低至3.78%,相比当前流行方法在保证算法效率的同时检测率均有提高。结论 两层级联卷积神经网络人脸检测方法能够在人脸多姿态变化和面部特征信息不完整等情况下实现精准检测,保证较高的检测率,有效降低误检率,方法具有较好的鲁棒性和泛化能力。
关键词
Two-layer cascaded convolutional neural network for face detection
Zhang Haitao, Li Meilin, Dong Shuaihan(College of Software, Liaoning Technical University, Huludao 125105, China) Abstract
Objective As an important part of face recognition, face detection has attracted considerable attention in computer vision and has been widely investigated. Face detection determines the location and size of human faces in an image. Traditional face detection methods are limited by face multi-pose changes and incomplete facial features, which lead to their poor detection effect. Modern face detectors can easily detect near-frontal faces. Recent research in this area has focused on the uncontrolled face detection problem, where a number of factors, such as multi-pose changes and incomplete facial features, can lead to large visual variations in face appearance and can severely degrade the robustness of the face detector. A convolutional neural network can automatically select facial features, rapidly delete a large number of non-face background information, and can achieve good face detection results. However, a single convolutional neural network should possess three functions, namely, facial feature extraction, reduction of feature dimensions to decrease the computational complexity, and feature classification, which result in complex network structure, limited detection speed, and overfitting of the network. To solve these problems, this study presents a face detection method of two-layer cascaded convolutional neural network (TC_CNN). Method First, a two-layer convolutional neural network model is constructed. The first convolutional neural network model is used to extract the features of the face image, and a max pooling method is adopted to reduce the dimension of those features in which multiple suspected face windows are outputted. Second, the face windows are used as the inputs of the second convolutional neural network model for fine feature extraction, and a new feature map is obtained by pool operation. Finally, the best detection window is outputted through full connection layer discrimination. The face is successfully detected and the face window is returned when the result of discriminant classification is a face; otherwise, the non-face window is deleted. An optimal face detection window can be selected through non-maximum suppression, the size and position of the face in the input image are returned based on the location information of the optimal face detection window, and the entire process of face detection is completed. In the training of TC_CNN, we use 10 000 images with near-frontal faces, face multi-pose changes, and incomplete facial features from the labeled faces in the Wild dataset as positive training samples and 1 000 images as negative training samples. In the testing of the TC_CNN model, we utilize an authoritative dataset FDDB to evaluate, measure, and determine the validity of the model based on four indexes, namely, detection rate, false detection rate, missing detection rate, and detection time. The TC_CNN model is compared with excellent face detection algorithms, such as AdaBoost, fast LBP, NPD+AdaBoost, and SPP+CNN methods. Result Images with face multi-pose changes and incomplete face feature information in the FDDB face detection dataset are selected for the test. Results show that the face detection rate by TC_CNN method is up to 96.39%, false detection rate is as low as 3.78%, and detection time is 0.451 s. For the detection rate, the TC_CNN method is 7.63% higher than the traditional AdaBoost method based on cascade idea, 3.57% higher than the fast LBP method, 0.50% higher than the NPD+AdaBoost method, and 6.04% higher than the SPP+CNN method. For the false detection rate, the TC_CNN method is 2.44% lower than the AdaBoost method, 4.47% lower than the fast LBP method, 0.59% lower than the NPD+AdaBoost method, and 5.09% lower than the SPP+CNN method. For the detection time, the TC_CNN method's detection efficiency is remarkably higher than the SPP+CNN method and slightly higher than the AdaBoost, fast LBP, and NPD+AdaBoost methods. In comparison with the current methods, the detection rate is increased while ensuring the efficiency of the algorithm. To verify the robustness of the TC_CNN model under the conditions of face multi-pose changes and incomplete facial features, representative images of two special cases are selected from the FDDB dataset in conducting four groups of comparative experiments under the multi-pose changes of a single face image, multi-pose changes of a multi face image, incomplete facial features of a single face image, and incomplete facial features of a multi face image. Experimental results show that the TC_CNN model shows good effectiveness and robustness compared with the four excellent algorithms or four groups of contrastive experiments under different interference conditions. Conclusion The TC_CNN model for face detection can achieve accurate detection under face multi-pose changes and incomplete facial feature information. This model can obtain a high detection rate and effectively reduce false detection rate. The method has good robustness and generalization capability. The TC_CNN method overcomes the limitations of the excellent AdaBoost cascade concept on the face detection method (such as cascading two convolutional neural networks; effectively avoiding the complex network structure caused by the three functions of extraction, reduction, and classification of features simultaneously), which easily cause overfitting and other contradictions. However, the selection of the number and parameter of the cascaded convolutional neural network is difficult for the improvement of the model performance and detection effect. In future research, we will determine the number and parameter of cascaded convolution neural network to optimize the model and will attempt to detect the size and position of the face accurately.
Keywords
face detection convolutional neural network ten-fold cross validation two-layer cascaded convolutional neural network max pooling
|