自纠正噪声标签的人脸美丽预测
摘 要
目的 人脸美丽预测是研究如何使计算机具有与人类相似的人脸美丽判断或预测能力,然而利用深度神经网络进行人脸美丽预测存在过度拟合噪声标签样本问题,从而影响深度神经网络的泛化性。因此,本文提出一种自纠正噪声标签方法用于人脸美丽预测。方法 该方法包括自训练教师模型机制和重标签再训练机制。自训练教师模型机制以自训练的方式获得教师模型,帮助学生模型进行干净样本选择和训练,直至学生模型泛化能力超过教师模型并成为新的教师模型,并不断重复该过程;重标签再训练机制通过比较最大预测概率和标签对应预测概率,从而纠正噪声标签。同时,利用纠正后的数据反复执行自训练教师模型机制。结果 在大规模人脸美丽数据库LSFBD (large scale facial beauty database)和SCUT-FBP5500数据库上进行实验。结果表明,本文方法在人工合成噪声标签的条件下可降低噪声标签的负面影响,同时在原始LSFBD数据库和SCUT-FBP5500数据库上分别取得60.8%和75.5%的准确率,高于常规方法。结论 在人工合成噪声标签条件下的LSFBD和SCUT-FBP5500数据库以及原始LSFBD和SCUT-FBP5500数据库上的实验表明,所提自纠正噪声标签方法具有选择干净样本学习、充分利用全部数据的特点,可降低噪声标签的负面影响,能在一定程度上降低人脸美丽预测中噪声标签的负面影响,提高预测准确率。
关键词
Self-correcting noise labels for facial beauty prediction
Gan Junying, Wu Bicheng, Zhai Yikui, He Guohui, Mai Chaoyun, Bai Zhenfeng(Department of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China) Abstract
Objective Human facial beauty prediction is the research on how to make computers have the ability to judge or predict the beauty of faces similar to humans. However, deep neural networks based facial beauty prediction has challenged the issue of noisy label samples affecting the training of deep neural network models, which thus affects the generalizability of deep neural networks. Noisy labels are mislabeled in the database, which usually affect the training of deep neural network models, thus reduce the generalizability of deep neural networks. To reduce the negative impact of noisy labels on deep neural networks in facial beauty prediction, a self-correcting noisy label method was proposed, which has the features of selection of clean samples for learning and full utilization of all data. Method Our method is composed of a self-training teacher model mechanism and a re-labeling retraining mechanism. First, two deep convolutional neural networks (CNNs) are initialized with the same structure simultaneously, and the network is used as the teacher model with stronger generalization ability, while the other network is used as the student model. The teacher model can be arbitrarily specified during initialization. Second, small batches of training data are fed to the teacher and student models both at the input side together. The student model receives the sample number and finds the corresponding sample and label for back-propagation training until the generalization ability of the student model exceeds that of the teacher model. Then, the student model shares the optimal parameters to the teacher model, i.e., the original student model becomes the new teacher model, where it is called the self-training teacher model mechanism. After several iterations of training, small batches of data are fed into the teacher model with the strongest generalization ability among all previous training epochs, and its prediction probability of each category is calculated. If the maximum output probability predicted by the teacher model for this data is higher than a certain threshold of the corresponding output probability of the label, it is considered that the sample label should be corrected. The self-training teacher model mechanism is then iteratively executed utilizing the corrected data, where the process above is called the relabeling retraining mechanism. Finally, the teacher model is output as the final model. Result The ResNet-18 model pre-trained on the ImageNet database is used as the backbone deep neural network, which is regarded as a baseline method with cross entropy as the loss function. The experiments on the large scale facial beauty database (LSFBD) and SCUT-FBP5500 database are divided into two main parts as mentioned below:1) the first part is performed under synthetic noise label conditions, i.e., 10%, 20%, and 30% of the training data are selected from each class of facial beauty data on the two databases mentioned above, while their labels are randomly changed. The accuracy of the method in this paper exceeds the baseline method by 5.8%, 4.1% and 3.7% on the LSFBD database at noise rates of 30%, 20% and 10%, respectively. The accuracy exceeds the baseline method by 3.1%, 2.8%, and 2.5% on the SCUT-FBP5500 database, respectively. Therefore, it is demonstrated that our method can reduce the negative impact of noisy labels under synthetic noisy label conditions. 2) The second part is carried out on the original LSFBD database and the original SCUT-FBP5500 database, and our method exceeded the prediction accuracy of the baseline method by 2.7% and 1.2% on the original LSFBD database and the original SCUT-FBP5500 database, respectively. Therefore, our demonstrated illustration can reduce the negative impact of noise labels under the original data conditions. Conclusion Our proposed method of self-correcting noise labels can reduce the negative impact of noise label in human facial beauty prediction in some extent and improve the prediction accuracy based on the LSFBD and SCUT-FBP5500 databases under synthetic noisy label circumstances, the original LSFBD and SCUT-FBP5500 facial beauty databases, respectively.
Keywords
deep learning noise labels facial beauty prediction characteristics classification deep neural networks
|