Current Issue Cover
结合自监督学习和生成对抗网络的小样本人脸属性识别

疏颖1,2, 毛龙彪1,2, 陈思3, 严严1,2(1.厦门大学信息学院, 厦门 361005;2.厦门大学福建省智慧城市感知与计算重点实验室, 厦门 361005;3.厦门理工学院计算机与信息工程学院, 厦门 361024)

摘 要
目的 人脸属性识别是计算机视觉和情感感知等领域一个重要的研究课题。随着深度学习的不断发展,人脸属性识别取得了巨大的进步。目前基于深度学习的人脸属性识别方法大多依赖于包含完整属性标签信息的大规模数据集。然而,对于小样本数据集的属性标签缺失问题,人脸属性识别方法的准确率依然较低。针对上述问题,本文提出了一种结合自监督学习和生成对抗网络的方法来提高在小样本数据集上的人脸属性识别准确率。方法 使用基于旋转的自监督学习技术进行预训练得到初始的属性识别网络;使用基于注意力机制的生成对抗网络得到人脸属性合成模型,对人脸图像进行属性编辑从而扩充训练数据集;使用扩充后的训练数据集对属性识别网络进行训练得到最终模型。结果 本文在小样本数据集UMD-AED(University of Maryland attribute evaluation dataset)上进行了实验并与传统的有监督学习方法进行了比较。传统的有监督学习方法达到了63.24%的平均准确率,而所提方法达到了69.01%的平均准确率,提高了5.77%。同时,本文在CelebA(CelebFaces attributes dataset)、LFWA(labeled faces in the wild attributes dataset)和UMD-AED数据集上进行了使用自监督学习和未使用自监督学习的对比实验,验证了自监督学习在小样本数据集上的有效性。结论 本文所提出的结合自监督学习和生成对抗网络的人脸属性识别方法有效提高了小样本数据集上属性识别的准确率。
关键词
Self-supervised learning and generative adversarial network-based facial attribute recognition with small sample size training

Shu Ying1,2, Mao Longbiao1,2, Chen Si3, Yan Yan1,2(1.School of Informatics, Xiamen University, Xiamen 361005, China;2.Fujian Key Laboratory of Sensing and Computing for Smart City, Xiamen University, Xiamen 361005, China;3.School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China)

Abstract
Objective Facial attribute recognition is an important research topic in the fields of computer vision and emotion sensing. Face, an important biological feature of human beings, contains a large number of attributes, such as expression, age, and gender. Facial attribute recognition aims to predict the different attributes in a given facial image. Facial attribute recognition has progressed considerably with the remarkable development of deep learning. State-of-the-art deep learning-based facial attribute recognition methods typically rely on large-scale training facial data with complete attribute labels. However, the number of training facial data may be limited in some real-world applications and several attribute labels of the facial image are unavailable, mainly because attribute labeling is a time-consuming and labor-intensive task. Notably, defining a standard criterion for attribute labeling is difficult for some subjective attributes. As a result, the accuracy of these methods is poor when addressing the problem of missing attribute labels in small sample size training. Previous methods attempted to find samples that match the required label from the unlabeled dataset and then added these samples to the corresponding category of the training set to augment the training data. Note that the unlabeled dataset is typically of low quality, thereby affecting the final performance of the model. Furthermore, the selection of matching samples is time consuming. Some methods directly take advantage of similar data to augment the original dataset. However, deciding whether two datasets are similar and finding similar datasets are still challenging. Current methods need further investigation on facial attribute recognition under small sample size training. A self-supervised learning and generative adversarial network (GAN)-based method is proposed in this study to solve the above-mentioned problems and improve the accuracy of facial attribute recognition for small sample size training with missing attribute labels. Method First, we adopt a rotation-based self-supervised learning technique to pretrain the attribute classification network. We use ResNet50 as the basic model of the network and modify output nodes of the last fully connected layer to four to predict the rotation angle of the input image (including 0°, 90°, 180° and 270°). Second, we concatenate rotated and original images along the channel dimension and use them as the input of the self-supervised learning network. Third, we utilize an attention mechanism-based GAN as the facial attribute synthesis model, where facial attributes can be edited to augment both attribute labels and training data. Specifically, the feature map is passed through a 1×1 convolution layer and then multiplied with its own transpose. In this way, we obtain attention features by multiplying the attention and feature maps. Fourth, we use this model to edit facial attributes to augment labels and training data. Finally, we use the augmented training data to train the attribute classification network initialized with self-supervised learning. We use the stochastic gradient descent algorithm during the training process. We select attributes of “baldness”, “bangs”, “black hair”, “blonde”, “brown hair”, “dense eyebrows”, “wearing glasses”, “male”, “slightly open mouth”, “mustache”, “no beard”, “pale skin”, and “young” in the experiment of synthesizing facial attributes. Both the encoder and decoder of the GAN generator contain five layers, and the discriminator also consists of five layers. We set the batch size to 64 and use a learning rate of 0.000 2. Result We use one-tenth of the data for training using CelebFaces attributes dataset (CelebA), labeled faces in the wild attributes dataset (LFWA), and University of Maryland attribute evaluation dataset (UMD-AED) in the experiments. The accuracy of the proposed method using CelebA and LFWA is improved by 2.42% and 3.17% respectively, compared with the traditional supervised learning-based method. The accuracy of the proposed method using UMD-AED is improved by 5.77%. We also conduct experiments on different sizes of training sets using CelebA, LFWA, and UMD-AED to verify the effectiveness of the self-supervised learning technique on the small dataset further. Experimental results showed that the model demonstrates a significantly improved performance with self-supervised learning when the size of the training set is small (from the complete training data to one-tenth of the training data). The performance of the supervised model using CelebA decreases from 90.86% to 81.97%, while the performance of the self-supervised model decreases from 90.72% to 83.57%. The performance of the supervised model using LFWA decreases by 6.11%, while the performance of the self-supervised model decreases by 3.90%. The performance of the supervised model using UMD-AED decreases by 16.95%, while the performance of the self-supervised model decreases by 11.50%. Conclusion The proposed method utilizes self-supervised learning to pretrain the initial model and uses GAN for data augmentation. Experimental results showed that our proposed method effectively improves the accuracy of facial attribute recognition for small sample size training with missing attribute labels.
Keywords

订阅号|日报