地球信息科学学报 ›› 2013, Vol. 15 ›› Issue (6): 854-861.doi: 10.3724/SP.J.1047.2013.00854

• 地理模型与算法 • 上一篇    下一篇

空间加权距离的GIS数据Fuzzy C-means聚类方法与应用分析

王海起, 张腾, 彭佳琦, 董倩楠   

  1. 中国石油大学(华东)地球科学与技术学院, 青岛 266580
  • 收稿日期:2013-11-11 修回日期:2013-12-02 出版日期:2013-12-25 发布日期:2013-12-25
  • 作者简介:王海起(1972-),男,博士,副教授,主要从事地理信息系统、空间数据挖掘的研究和应用。E-mail:wanghaiqi@upc.edu.cn
  • 基金资助:

    山东省自然科学基金项目(ZR2012DM010);国家自然科学基金项目(40701138)。

Fuzzy C-means Clustering for GIS Data Based on Spatial Weighted Distance

WANG Haiqi, ZHANG Teng, PENG Jiaqi, DONG Qiannan   

  1. School of Geosciences, China University of Petroleum (East China), Qingdao 266580, China
  • Received:2013-11-11 Revised:2013-12-02 Online:2013-12-25 Published:2013-12-25

摘要:

Fuzzy c-means聚类常采用普通欧式距离进行相似性度量,对于地理空间对象来说,聚类不仅应考虑属性特征的相似性,还应考虑对象的空间邻近性。本文基于普通欧式距离提出了多种形式的空间加权距离公式,不同的距离公式分别在两个坐标方向、各属性上进行加权,权重向量既可以度量空间位置特征、属性特征的作用大小,也可度量位置距离在X、Y空间方向上的各向同性或异性程度。权重向量的获取以空间对象相似性的模糊函数为评价目标,通过动态学习率的梯度下降算法优化计算,并将空间加权距离引入到fuzzy c-means聚类算法中以取代普通欧式距离。本文以空间数据集Meuse为应用实例,分别采用不同形式的空间加权距离进行FCM模糊聚类,类数取为2-10类,通过PC、PE和Xie-Beni等聚类有效性指标的比较表明:空间加权距离的聚类效果要优于普通距离,且在空间数据聚类分析中,除属性信息外位置等空间特征信息同样起到了重要作用。

关键词: 空间加权距离, 梯度下降学习算法, GIS数据, Fuzzy C-means聚类

Abstract:

Ordinary Euclidean distance is often used to measure similarity in fuzzy C-means, and in distance formula, different attribute features should have different weights according to their important degree. Moreover, for geospatial objects, clustering should consider not only similarity of attribute features, but also spatial proximity of the objects. Based on ordinary Euclidean distance, several forms of spatial weighted distance are proposed in this paper. Different distance formula imposes different weight on both two coordinate directions and each attribute feature. The weight vector is used to measure effect sizes of spatial location features and attribute features in similarity-based clustering and also measure degree of isotropy and anisotropy along X and Y coordinate directions. A fuzzy evaluation function derived from similarity matrix of spatial objects is used as optimization objective, and the weight vector is learned by gradient-descent algorithm based on dynamic learning rate. Then, spatial weighted distance is introduced to fuzzy C-means clustering to replace ordinary Euclidean distance. Meuse dataset, a spatial dataset as the application example, is analyzed by FCM clustering and the clustering number is set to 2-10. The clustering results are evaluated and compared via cluster validity indices including PC, PE and Xie-Beni. The analysis indicates that clustering performance based on spatial weighted distance is better than ordinary Euclidean distance and spatial common distance, and further, spatial distribution of the clustering results shows that, besides attribute features, spatial features such as locations also play important roles in spatial data clustering.

Key words: Fuzzy C-means clustering, gradient-descent learning algorithm, spatial weighted distance, GIS data