Current Issue Cover
基于样本选择的最近邻凸包分类器

姜文瀚1, 周晓飞1, 杨静宇1(南京理工大学计算机科学与技术学院,南京 210094)

摘 要
摘要:最近邻凸包分类算法是一种以测试点到各类别样本凸包的距离为分类度量的最近邻分类算法。然而,该算法的凸二次规划问题优化求解的较高的计算复杂度限制了其在较大规模数据集上的应用。本文提出一种样本选择方法——子类凸包生长法。通过迭代,选择距离选出样本凸包最远的点,直到满足终止条件,从而实现数据集的有效约简。ORL数据库和MIT-CBCL人脸识别training-synthetic库上的实验结果表明,子类凸包生长法选出的少量样本生成的凸包能够很好的表征训练集,在不降低最近邻凸包分类器性能的同时,使得算法的计算速度大为提高。
关键词
A Nearest Neighbor Convex Hull Classifier with Sample Selection

()

Abstract
Abstract:The nearest neighbor convex hull(NNCH) classification algorithm is a kind of nearest neighbor classification method, which takes the approximation errors of the convex hulls of all members of every class to the test point as the discriminant measures. However, the higher computation costs of quadratic optimization problems of the algorithm limit its applications on large data sets. So a sample selection method for NNCH named subclass convex hull growth is proposed in this paper. For one class data, the farthest two points are selected first as the initial chosen set. Then, the distances of others to the convex hull of the chosen set are computed respectively. We choose the farthest one and add it into the chosen set. This procedure is repeated until the end conditions. The convex hull of selected samples is taken as the approximation of all. The more samples are selected the less approximation error is achieved, so the valid estimation of sample distribution is realized. Experiments on the ORL database and the MIT-CBCL face recognition training-synthetic database show the abilities of this method to reduce the training data and accelerate the computation while maintaining the generalization performance of NNCH.
Keywords

订阅号|日报