Current Issue Cover
半监督局部维数约减

尹学松1,2, 胡恩良1(1.南京航空航天大学信息科学与技术学院,南京 210016;2.浙江广播电视大学信息与工程学院,杭州 310030)

摘 要
在挖掘和分析高维数据任务中,有时只能获得有限的成对约束信息(must-link约束和cannot-link约束),由于缺乏数据类标号信息,监督维数约减方法常常不能得到满意的结果。在这种情况下,使用大量的无标号样本可以提高算法的性能。文中借助于成对约束信息和大量无标号样本,提出半监督局部维数约减方法(SLDR)。SLDR集成数据的局部信息和成对约束寻找一个最优投影,当数据被投影到低维空间时,不仅cannot-link约束中样本点对之间距离更远、must-link约束中样本点对之间距离更近,数据的内在几何信息还被保持。而且SLDR能推广为非线性方法,使之能够适应非线性数据的维数约减。在各种数据集上的实验结果充分验证了所提出算法的有效性。
关键词
Semi-supervised locality dimensionality reduction

()

Abstract
In mining and analyzing high-dimensional data task, when only a small number of pairwise constraints including must-link and cannot-link are available, supervised dimensionality reduction methods tend to perform poorly due to the lack of data labels. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a novel semi-supervised locality dimensionality reduction algorithm (SLDR) in terms of pairwise constraints and abundant unlabeled samples. Specifically, SLDR can effectively use local information of the data and pairwise constraints to find a projection. After the data is projected onto a low-dimensional space, instances involved by cannot-link constraints are far apart, while instances involved by must-link constraints are close to each other. Moreover, the intrinsic geometric information of the data is preserved. In addition, SLDR can be extended to nonlinear dimensionality reduction scenarios by the kernel trick, which is applied to reduce the dimensions of highly nonlinear data.
Keywords

订阅号|日报