Current Issue Cover
面向人脸年龄估计的深度融合神经网络

孙宁1, 顾正东1,2, 刘佶鑫1, 韩光1(1.南京邮电大学宽带无线通信技术教育部工程研究中心, 南京 210003;2.南京邮电大学通信与信息工程学院, 南京 210003)

摘 要
目的 为了提高人脸图像年龄估计的精度,提出一种端对端可训练的深度神经网络模型来进行人脸年龄估计。方法 该网络模型由多个卷积神经网络(CNN)和一个深度置信网络(DBN)堆叠而成,称为深度融合网络(DFN)。首先使用多个并联的CNN提取人脸图像多个区域的外观特征,将得到的特征进行串接输入一个DBN网络进行非线性融合。为了实现DFN的端到端的整体训练,提出一种逐网络迭代训练(INWT)的机制。为了降低过拟合效应,那些对应人脸局部图像的CNN经过多次迭代迁移学习实现面向人脸年龄估计任务的训练。完成对DFN中所有CNN和DBN的预训练后,再进行全网络端到端的整体精调。结果 在两个人脸年龄图像库MORPHⅡ和FG-NET上对本文方法进行测试,实验结果显示基于DFN的人脸年龄估计方法能在两个人脸图像库中分别取得平均绝对误差(MAE)等于3.42和4.14的估计精度,与目前主流的年龄估计算法,如基于浅层学习的CA-SVR方法(两个数据库上取得的MAE分别等于5.88和4.75),基于深度学习的DeepRank+方法(MORPHⅡ数据库上取得的MAE为3.49)和Deep-CS-LBMFL方法(FG-NET数据库上取得的MAE为4.22)等相比,估计精确度明显提高。结论 本文提出基于深度融合网络的人脸年龄估计方法与当前大部分基于深度神经网络的主流算法相比具有明显的优势。
关键词
End-to-end trainable deep fusion network for facial age estimation

Sun Ning1, Gu Zhengdong1,2, Liu Jixin1, Han Guang1(1.Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;2.School of communication and information engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)

Abstract
Objective In this study, we propose a facial age estimation (FAE) method based on end-to-end trainable deep neural network called deep fusion network (DFN), which adopts the idea of stacking multiple CNNs (Convolutional Neural Networks) and a DBN (Deep Belief Network) extract and fuse facial features for age estimation. Method DFN-based method for FAE comprises image preprocessing, feature extraction, feature fusion, and age estimation. In image preprocessing, the faces are cropped from images by a face detector. Face alignment is utilized to deform the face image to a fixed size and position based on landmark points, which reduce the adverse effects on the subsequent process due to various face poses and noises. Several multiscale local patches are cropped from the aligned face image based on facial landmark points. We employ CNN as the feature extraction module (FEM), which extracts deep features from the local face patches obtained by image preprocessing. The number of FEM is 37, which is the same as that of face patches. One FEM corresponds to one local face patch. Thirty-seven parallel FEMs can simultaneously extract global and local facial features from the face patches. After feature extraction from local patches by multiple FEMs, we obtain 37 CNN features with a size of 160. These 37 deep features are concatenated to form a feature vector. We use a DBN model to fuse these deep features. Two challenges exist in DFN training. The first challenge is implementing the end-to-end training of DFN, which comprises multiple parallel CNNs and one stacked DBN. The other challenge is training large-scale deep neural networks on limited local face patches. To address these issues, a scheme of iterative net-wise training (INWT) is proposed to train the DFN. The term "net-wise" means that all neural networks, including multiple CNNs and one DBN, in the DFN are pre-trained network by network, and the entire DFN then undergoes a globally end-to-end fine-tuning. The term "iterative" means that we use a scheme of multiple iterative transfer learning to train the network of FEM on limited local face patches. CNNs corresponding to patches that contain a small portion of the face are gradually fine-tuned on the basis of multiple iterative transfer learning to reduce overfitting. After all CNNs and DBN are pretrained, the DFN is globally fine-tuned to perform a regression of face age estimation. Result We conduct extensive experiments to evaluate the proposed FAE method. The experiments are performed on two well-known benchmarks, namely, FG-NET database and MORPH Ⅱ databases. First, we evaluate the performance of the proposed method in the case of using different iterations of transfer learning. Results show that the proposed multiple iterative transfer learning can significantly improve the accuracy of age estimation. Second, we evaluate the performance of the proposed method with different patch combinations. Results show that various scales of local patches provide complementary information for FAE and that they all contribute to the decrease of MAE. Third, we evaluate the proposed method with four fusion methods. In comparison with LR, SVR, and RA, DBN-based method can achieve the best MAE in all experiments. Finally, the proposed method is compared with state-of-the-art methods. Experimental results on the two databases show that the proposed DFN-based method is an effective deep architecture for FAE and achieves a competitive performance (MAE=3.42 and 4.14) compared with state-of-the-art methods. Conclusion We propose a deep neural network called DFN for FAE. Multiple CNNs are trained to extract deep facial age features, and one DBN is stacked for feature fusion, which makes the DFN a globally trainable end-to-end deep learning model that enlarges the scale of neural network for better age estimation performance. Then, INWT scheme is developed to train the DFN on limited multiscale local face patches. Experimental results on MORPH Ⅱ and FG-NET databases show that DFN is an effective deep learning model for FAE and can achieve a competitive result compared with state-of-the-art methods.
Keywords

订阅号|日报