Current Issue Cover
持续学习改进的人脸表情识别

江静, 邓伟洪(北京邮电大学人工智能学院, 北京 100876)

摘 要
目的 大量标注数据和深度学习方法极大地提升了图像识别性能。然而,表情识别的标注数据缺乏,训练出的深度模型极易过拟合,研究表明使用人脸识别的预训练网络可以缓解这一问题。但是预训练的人脸网络可能会保留大量身份信息,不利于表情识别。本文探究如何有效利用人脸识别的预训练网络来提升表情识别的性能。方法 本文引入持续学习的思想,利用人脸识别和表情识别之间的联系来指导表情识别。方法指出网络中对人脸识别整体损失函数的下降贡献最大的参数与捕获人脸公共特征相关,对表情识别来说为重要参数,能够帮助感知面部特征。该方法由两个阶段组成:首先训练一个人脸识别网络,同时计算并记录网络中每个参数的重要性;然后利用预训练的模型进行表情识别的训练,同时通过限制重要参数的变化来保留模型对于面部特征的强大感知能力,另外非重要参数能够以较大的幅度变化,从而学习更多表情特有的信息。这种方法称之为参数重要性正则。结果 该方法在RAF-DB(real-world affective faces database),CK+(the extended Cohn-Kanade database)和Oulu-CASIA这3个数据集上进行了实验评估。在主流数据集RAF-DB上,该方法达到了88.04%的精度,相比于直接用预训练网络微调的方法提升了1.83%。其他数据集的实验结果也表明了该方法的有效性。结论 提出的参数重要性正则,通过利用人脸识别和表情识别之间的联系,充分发挥人脸识别预训练模型的作用,使得表情识别模型更加鲁棒。
关键词
Facial expression recognition improved by continual learning

Jiang Jing, Deng Weihong(School of Artifical Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China)

Abstract
Objective Facial expression recognition (FER) has become an important research topic in the field of computer vision. FER plays an important role in human-computer interaction. Most studies focus on classifying basic discrete expressions (i.e., anger, disgust, fear, happiness, sadness, and surprise) using static image-based approaches. Recognition performance in deep learning-based methods has progressed considerably. Deep neural networks, especially convolutional neural networks (CNNs), achieve outstanding performance in image classification tasks. A large amount of labeled data is needed for training deep networks. However, insufficient samples in many widely used FER datasets lead to overfitting in the trained model. Fine-tuning a network that has been well pre-trained on a large face recognition dataset is commonly performed to solve the shortage of samples in FER datasets and prevent overfitting. The pre-trained network can capture facial information and the similarity between face recognition (FR) and FER domains facilitates the transfer of features. Although this transfer learning strategy demonstrates satisfactory performance, the fine-tuned FR network may still contain face-dominated information, which can weaken the network's ability to represent different expressions. On the one hand, we expect to reserve the strong ability of the FR network to capture important facial information, such as face contour, and guide the FER network training in real cases. On the other hand, we want the network to learn additional expression-specific information. The FER model training using a continual learning approach is proposed to utilize the close relationship between FR and FER effectively and exploit the ability of the pre-trained FR network. Method This study aims to train an expression recognition network with auxiliary significant information of face recognition network instead of only using a fine-tuning approach. We first introduce a continual learning approach into the field of FER. Continual learning analyzes the problem learning from an infinite stream of data with the objective of gradually extending the acquired knowledge and using it for future learning. Synaptic intelligence consolidates important parameters of previous tasks to solve the problem of catastrophic forgetting and alleviate the reduction in performance by preventing those important parameters from changing in future tasks. Similar to continual learning, we conduct the FR task before the FER task is added. However, we only focus on the performance of the later task while continual learning also aims to alleviate the catastrophic forgetting of the original task. Sequential tasks in continual learning commonly contain a small number of classes so that important parameters are related to current classes. However, important parameters are more likely to capture common facial features rather than specific classes due to the large amount of categories in the FR task, thereby remarkably increasing their contributions to the total loss. Hence, a two-stage training strategy is proposed in this study. We train a FR network and compute each parameter's importance while training in the first stage. We refine the pre-trained network with the supervision of expression label information while preventing important parameters from excessively changing in the second stage. The loss function for expression classification is composed of two parts, namely, softmax loss and parameter-wise importance regularization. Result We conduct experiments on three widely used FER datasets, including CK+(the extended Cohn-Kanade database), Oulu-CASIA, and RAF-DB(real-word affective faces database). RAF-DB is an in-the-wild database while the two other databases are laboratory-controlled. The use of RAF-DB achieves an accuracy of 88.04%, which improves the performance of direct fine-tuning by 1.83% and surpasses the state-of-the-art algorithm self-cure network (SCN) by 1.01%. The result using CK+ improves the fine-tuning baseline by 1.1%. The experiment using Oulu-CASIA also indicated that the network has satisfactory generalization performance with the addition of parameter-wise importance regularization. Meanwhile, the effect of such regularization improves the performance on in-the-wild datasets more remarkblely due to the more complex faces under occlusion and pose variations. Conclusion We exploit the relationship between FR and FER and adopt the idea and algorithm of continual learning in FER to avoid overfitting in this study. The main purpose and effect of continual learning is to preserve the powerful feature extraction ability of the FR network via parameter-wise importance regularization and allow less-important parameters to learn additional expression-specific information. The experimental results showed that our training strategy helps the FER network to learn additional discriminative features and thus promote recognition performance.
Keywords

订阅号|日报