地球信息科学学报 ›› 2018, Vol. 20 ›› Issue (7): 887-894.doi: 10.12082/dqxxkx.2018.180081

• 地球信息科学理论与方法 • 上一篇    下一篇

改进的最小生成树自适应空间点聚类算法

颜金彪1,2(), 郑文武1,2, 段晓旗1,2,*(), 邓运员1,2, 郭元军1,2, 胡最1,2   

  1. 1. 传统村镇文化数字化保护与创意利用技术国家地方联合工程实验室, 衡阳 421002
    2. 湖南省古村古镇文化遗产数字化传承协同创新中心, 衡阳 421002
  • 收稿日期:2018-01-30 修回日期:2018-04-15 出版日期:2018-07-20 发布日期:2018-07-13
  • 通讯作者: 段晓旗 E-mail:715829216@qq.com;201997125@qq.com
  • 作者简介:

    作者简介:颜金彪(1987-),男,湖南衡阳人,硕士生,主要从事时空数据挖掘研究。E-mail: 715829216@qq.com

  • 基金资助:
    国家自然科学基金项目(41471118、41771150、41771188);衡阳师范学院青年项目(16A01、17A02)

Improved Adaptive Spatial Points Clustering Algorithm Based on Minimum Spanning Tree

YAN Jinbiao1,2(), ZHENG Wenwu1,2, DUAN Xiaoqi1,2,*(), DENG Yunyuan1,2, GUO Yuanjun1,2, HU Zui1,2   

  1. 1. National-Local Joint Engineering Laboratory on Digital Preservation and Innovative Technologies for the Culture of Traditional Villages and Towns, Hengyang 421002, China
    2. Cooperative Innovation Center for Digitalization of Cultural Heritage in Traditional Villages and Towns, Hengyang 421002, China
  • Received:2018-01-30 Revised:2018-04-15 Online:2018-07-20 Published:2018-07-13
  • Contact: DUAN Xiaoqi E-mail:715829216@qq.com;201997125@qq.com
  • Supported by:
    National Natural Science Foundation of China, No.41471118, 41771150, 41771188; Youth Project of Hengyang Normal university, No.16A01,17A02

摘要:

针对传统的最小生成树聚类算法存在使用全局不变阈值确定噪声边,聚类需要用户根据经验确定初始化聚类参数,如“边权值倍数容差”,“边长变化因子”等,聚类不能发现局部噪声的问题,本文提出了一种改进的最小生成树自适应空间点聚类算法。该算法在无需用户输入参数的前提下,克服主观因素的影响,根据最小生成树边长的数理统计特征定义裁剪因子。算法首先从宏观层面对最小生成树进行首轮删枝操作,消除全局环境下的噪声边,进而根据各子树的边长统计情况,自适应设定局部裁剪因子,进行第二轮删枝操作,消除局部环境下的噪声边。最后,采用1个模拟数据和1个实际应用验证算法的有效性,结果表明本文提出的改进算法在无需人为提供经验参数的环境下能够发现任意形状、不同密度的簇,能够准确的识别出空间点中的噪声数据,从而能够实现空间点数据背后隐藏信息的自动挖掘。

关键词: 最小生成树, 全局裁剪, 局部裁剪, 自适应, 聚类

Abstract:

In this paper, we proposed an improved adaptive spatial point clustering algorithm based on minimum spanning tree (MSTAA in abbreviation) to solve the problems existed in the traditional clustering algorithms. The first problem of these classical clustering algorithms is that the noise edges are determined using the global invariant. Another one is that the initial clustering parameters such as edge weight tolerance, edge variation factor, the number of clusters and initial clustering centers are determined by the users empirically. Furthermore, these algorithms can't find the noise edges at the local level. Based on these problems mentioned above, the algorithm put forward in this article aims to overcome the influence of subjective factors by defining two clipping factors. These trimming factors do not need to be determined by the users and can be automatically obtained according to the statistical features of the side length in the minimum spanning tree. The detailed realization process is as follows. In the first place, the pruning operation on the minimum spanning tree from the global level is carried out, which can eliminate the noises in the global environment. After the first round of tailoring, the initial minimum spanning tree becomes sub-tree collections. In the second place, in order to eliminate the noises at the local level, the algorithm performs the second round of pruning operation by setting the adaptive local cutting factor in the light of the side length statistics of each sub-tree. After the above two rounds of cutting, the MSTAA algorithm will get the final clustering result. In order to validate the effectiveness of the algorithm, both a simulated data and a practical application are adopted. By comparing with 4 classical clustering algorithms (k-means, DBSCAN, SEMST, HEMST), we find that the improved algorithm presented in this paper could find clusters of arbitrary shape and density in the environment where no one provides experience parameters. At the same time, not only does the MSTAA algorithm eliminate the obvious global noise points, but also it can distinguish noise points at the local environment so as to ensure a high similarity degree of cluster point set. All of the features of the MSTAA algorithm mentioned above make it possible to automatically mine hidden information behind spatial point data.

Key words: Minimum Spanning Tree, global clipping, local clipping, adaptive, clustering