改进的最小生成树自适应空间点聚类算法
作者简介:颜金彪(1987-),男,湖南衡阳人,硕士生,主要从事时空数据挖掘研究。E-mail: 715829216@qq.com
收稿日期: 2018-01-30
要求修回日期: 2018-04-15
网络出版日期: 2018-07-13
基金资助
国家自然科学基金项目(41471118、41771150、41771188);衡阳师范学院青年项目(16A01、17A02)
Improved Adaptive Spatial Points Clustering Algorithm Based on Minimum Spanning Tree
Received date: 2018-01-30
Request revised date: 2018-04-15
Online published: 2018-07-13
Supported by
National Natural Science Foundation of China, No.41471118, 41771150, 41771188; Youth Project of Hengyang Normal university, No.16A01,17A02
Copyright
针对传统的最小生成树聚类算法存在使用全局不变阈值确定噪声边,聚类需要用户根据经验确定初始化聚类参数,如“边权值倍数容差”,“边长变化因子”等,聚类不能发现局部噪声的问题,本文提出了一种改进的最小生成树自适应空间点聚类算法。该算法在无需用户输入参数的前提下,克服主观因素的影响,根据最小生成树边长的数理统计特征定义裁剪因子。算法首先从宏观层面对最小生成树进行首轮删枝操作,消除全局环境下的噪声边,进而根据各子树的边长统计情况,自适应设定局部裁剪因子,进行第二轮删枝操作,消除局部环境下的噪声边。最后,采用1个模拟数据和1个实际应用验证算法的有效性,结果表明本文提出的改进算法在无需人为提供经验参数的环境下能够发现任意形状、不同密度的簇,能够准确的识别出空间点中的噪声数据,从而能够实现空间点数据背后隐藏信息的自动挖掘。
颜金彪 , 郑文武 , 段晓旗 , 邓运员 , 郭元军 , 胡最 . 改进的最小生成树自适应空间点聚类算法[J]. 地球信息科学学报, 2018 , 20(7) : 887 -894 . DOI: 10.12082/dqxxkx.2018.180081
In this paper, we proposed an improved adaptive spatial point clustering algorithm based on minimum spanning tree (MSTAA in abbreviation) to solve the problems existed in the traditional clustering algorithms. The first problem of these classical clustering algorithms is that the noise edges are determined using the global invariant. Another one is that the initial clustering parameters such as edge weight tolerance, edge variation factor, the number of clusters and initial clustering centers are determined by the users empirically. Furthermore, these algorithms can't find the noise edges at the local level. Based on these problems mentioned above, the algorithm put forward in this article aims to overcome the influence of subjective factors by defining two clipping factors. These trimming factors do not need to be determined by the users and can be automatically obtained according to the statistical features of the side length in the minimum spanning tree. The detailed realization process is as follows. In the first place, the pruning operation on the minimum spanning tree from the global level is carried out, which can eliminate the noises in the global environment. After the first round of tailoring, the initial minimum spanning tree becomes sub-tree collections. In the second place, in order to eliminate the noises at the local level, the algorithm performs the second round of pruning operation by setting the adaptive local cutting factor in the light of the side length statistics of each sub-tree. After the above two rounds of cutting, the MSTAA algorithm will get the final clustering result. In order to validate the effectiveness of the algorithm, both a simulated data and a practical application are adopted. By comparing with 4 classical clustering algorithms (k-means, DBSCAN, SEMST, HEMST), we find that the improved algorithm presented in this paper could find clusters of arbitrary shape and density in the environment where no one provides experience parameters. At the same time, not only does the MSTAA algorithm eliminate the obvious global noise points, but also it can distinguish noise points at the local environment so as to ensure a high similarity degree of cluster point set. All of the features of the MSTAA algorithm mentioned above make it possible to automatically mine hidden information behind spatial point data.
Key words: Minimum Spanning Tree; global clipping; local clipping; adaptive; clustering
Fig. 1 The flow chart of MSTAA algorithm图1 MSTAA算法流程图 |
Fig. 2 Remove the noises in the minimum spanning tree图2 剔除最小生成树中的噪声边 |
Fig. 3 Clustering results图3 算法聚类结果 |
Fig. 4 The epicenter of the earthquake图4 地震震中位置 |
Fig. 5 Spatial clustering results on seismic data by MSTAA图5 MSTAA对地震数据的聚类结果 |
The authors have declared that no competing interests exist.
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
[
|
[7] |
|
[8] |
|
[9] |
[
|
[10] |
|
[11] |
|
[12] |
|
[13] |
[
|
[14] |
|
[15] |
[
|
[16] |
[
|
[17] |
[
|
[18] |
[
|
[19] |
[
|
[20] |
|
[21] |
|
[22] |
[
|
[23] |
|
[24] |
|
[25] |
|
[26] |
|
/
〈 |
|
〉 |