Current Issue Cover
一种带控制节点的最小生成树聚类方法

汪闽1, 周成虎1, 裴韬1, 韩志军1, 秦承志1, 蔡强1(中国科学院资源与环境信息系统国家重点实验室,北京 100101)

摘 要
综合考虑对象间相对距离和高等级对象对低等级对象的集聚效应这两种聚类影响因素,提出了一种带控制节点的最小生成树聚类方法。该方法用聚类对象间距离为权构建一棵最小生成树,将树中高等级节点作为分割最小树时选取被打断边的控制因素,使本次分割而成的两子树都包含控制节点,且被打断的边是在此条件下的最长边,最终使每棵子树包含且仅包含一个控制节点。检验自构建数据和地震数据的聚类结果证明,该方法在某些情况下能够较好地揭示数据分布的真实规律。
关键词
A MST Based Clustering Method with Controlling Vertexes

()

Abstract
Taking into consideration the two clustering factors, the mutual distance between clustering objects and the centralizing effects of the higher level objects on the lower, a new clustering method based on minimum cost span tree with control vertexes is proposed. The MST is built based on the power of the clustering objects' mutual distance, and the selecting standard of the splitted edges is controlled by the higher level vertexes. Each splitted edge should be the longest edge under the condition that the two descendant trees must include at least one controlling vertex, and each descendant tree would include one and only one controlling vertex by the end of the algorithm. It has been verified by clustering the data built by ourselves and the earthquake data that this method, with simple input and little intervention, can discover better the true law of data distribution in some cases. To fulfill the needs of data mining, the selecting standard of the controlling vertexes, the 'inconsistent edges' and the efficiency of the algorithm should be improved.
Keywords

订阅号|日报