分类错误指导的分层B-CNN模型用于细粒度分类
摘 要
目的 细粒度分类近年来受到了越来越多研究者的广泛关注,其难点是分类目标间的差异非常小。为此提出一种分类错误指导的分层双线性卷积神经网络模型。方法 该模型的核心思想是将双线性卷积神经网络算法(B-CNN)容易分错、混淆的类再分别进行重新训练和分类。首先,为得到易错类,提出分类错误指导的聚类算法。该算法基于受限拉普拉斯秩(CLR)聚类模型,其核心“关联矩阵”由“分类错误矩阵”构造。其次,以聚类结果为基础,构建了新的分层B-CNN模型。结果 用分类错误指导的分层B-CNN模型在CUB-200-2011、 FGVC-Aircraft-2013b和Stanford-cars 3个标准数据集上进行了实验,相比于单层的B-CNN模型,分类准确率分别由84.35%,83.56%,89.45%提高到了84.67%,84.11%,89.78%,验证了本文算法的有效性。结论 本文提出了用分类错误矩阵指导聚类从而进行重分类的方法,相对于基于特征相似度而构造的关联矩阵,分类错误矩阵直接针对分类问题,可以有效提高易混淆类的分类准确率。本文方法针对比较相近的目标,尤其是有非常相近的目标的情况,通过将容易分错、混淆的目标分组并进行再训练和重分类,使得分类效果更好,适用于细粒度分类问题。
关键词
Hierarchical B-CNN model guided by classification error for fine-grained classification
Shen Haihong1, Yang Xing1, Wang Lingfeng2, Pan Chunhong2(1.School of Information Engineering, China University of Geosciences(Beijing), Beijing 100083, China;2.Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China) Abstract
Objective Fine-grained classification has gained increasing attention in recent years. The subtle differences among categories remain challenging and could be addressed by localizing the parts of the object. This step requires a considerable amount of manual work. In this regard, bilinear convolutional neural network(B-CNN) models have been established using two feature extractors to represent an image. B-CNN only needs image labels and yields accurate results. However, B-CNN cannot distinguish confusing categories because the networks and classifiers are trained on all training images. We propose a hierarchical B-CNN model guided by classification error to combine confusing categories. We then retrain and reclassify the categories. The model can distinguish the categories and improve the classification accuracy on the fine-grained classification targets. Method This work mainly aims to retrain and reclassify confusing categories. First, we propose a clustering algorithm guided by classification error to obtain clusters containing frequently misclassified categories. The algorithm is established using constrained Laplacian rank(CLR) method, and the "affinity matrix" is constructed by the "classification error matrix." Considering that the labels of the test images are unknown, we conduct experiments on validation images. The classification error matrix is obtained by comparing the classification results and the real labels of the validation images. Second, we propose a new hierarchical B-CNN model. In the first layer, the networks and classifiers are trained on the entire training sets, and the test images are preliminarily classified. In the second layer, the networks and classifiers are trained on each cluster, and the test sets are reclassified. We select three datasets, namely, CUB-200-2011, FGVC-Aircraft-2013b, and Stanford-cars. First, we train the networks and classifiers on the entire training sets and obtain the classification results of the validation images. The datasets of CUB-200-2011 and Stanford-cars do not have validation sets; as such, a part of the training set is randomly assigned as the validation set. We obtain the "classification error matrix" by classification error of the validation set. The matrix comprises two columns designated for the classification result and the real label. Second, we construct the "affinity matrix" of size c×c for the dataset containing c categories, where(i, j) refers to the frequency at which the samples of the ith category are misclassified to the jth category. We also normalize the "affinity matrix" to obtain improved cluster results. The entire samples are divided into different groups by using the CLR algorithm. Each group contains only several categories that can be easily classified from one another. Finally, we extract training and testing sets by groups for retraining and reclassification. We retrain the convolutional neural networks and the SVM classifiers on each group of the training set. We re-extract the features of the corresponding testing set and reclassify them. We conduct other experiments to verify the effectiveness of the proposed algorithm. First, we retrain only the SVM classifiers without retraining the convolutional neural networks for simplicity. Second, we retrain the SVM classifiers guided by the distance of the features instead of the classification error. Result The classification accuracies of the single B-CNN model for CUB-200-2011, FGVC-Aircraft-2013b, and Stanford-cars are 84.35%, 83.56%, and 8945%, respectively, which increase to 84.48%, 84.01%, and 89.66%, respectively, after retraining the SVM classifiers and reclassifying the test samples guided by the classification error; moreover, the accuracies increase to 84.67%, 84.11%, and 89.78%, respectively, when the hierarchical B-CNN model is used. However, the accuracy of the results after retraining the SVM classifiers and reclassifying the test samples guided by the distance of the features is lower than that obtained using the single B-CNN model. Experimental results show that retraining SVM classifiers guided by classification error and retraining the networks can improve the classification accuracy. The accuracy of the result obtained with the distance of the features is low. Conclusion In B-CNN models, the networks and classifiers are constructed based on all training samples, resulting in confusing categories that are difficult to classify. In this paper, we propose a new hierarchical B-CNN model guided by classification error. In this model, clusters of the confusing categories are collected together. We retrain and reclassify each cluster to distinguish confusing categories. The classification error matrix is directly related to the classification problem and can provide higher classification accuracy than feature similarity. The experiment results based on the three datasets confirm that the proposed model can effectively improve the classification accuracy but requires a considerable amount of time. The model is suitable for fine-grained classification tasks, especially when dealing with similar targets. In our future work, we will develop our model in terms of two aspects. First, the number of clustering subsets after one cluster operation is relatively high, and the subsets can be further clustered. We will attempt to deepen the model from two to more layers. Second, this paper adopts clustering method for automatic selection. Other effective methods will be explored in our future studies.
Keywords
fine-grained classification classification error hierarchical model bilinear convolutional neural network (B-CNN) constrained Laplacian rank (CLR)
|