地球信息科学学报 ›› 2011, Vol. 13 ›› Issue (3): 383-390.doi: 10.3724/SP.J.1047.2011.00383

• 城市与农村发展空间格局研究 • 上一篇    下一篇

网格环境下分布式空间离群挖掘体系的设计与应用

姚明经1, 林甲祥1, 陈崇成1, 马亨冰2   

  1. 1. 福州大学福建省空间信息工程研究中心、空间数据挖掘与信息共享教育部重点实验室,福州 350002;
    2. 福建省经济信息中心,福州 350001
  • 收稿日期:2010-11-01 修回日期:2011-03-29 出版日期:2011-06-25 发布日期:2011-06-15
  • 通讯作者: 陈崇成(1968-),男,福建闽清县人, 博士、教授,主要研究方向为空间数据挖掘与知识网格、地学可视化与虚拟地理环境。E-mail: chencc@fzu.edu.cn E-mail:chencc@fzu.edu.cn
  • 作者简介:姚明经(1984-),男,湖北宜都人,硕士研究生,主要研究方向为空间数据挖掘与信息可视化。 E-mail: mingjingyao@yahoo.com.cn
  • 基金资助:

    国家自然科学基金项目(30972299);中-匈政府间科技合作项目(国科外字[2008]333号); 欧盟第七框架计划项目(FP7-2009-People-IRSES,No.247608); 福建省重点科技项目(2010I0008)。

Service and Application of Grid Based Distributed Spatial Outliers Mining

YAO Minjing1, LIN Jiaxiang1, CHEN Chongcheng1, MA Hengbing2   

  1. 1. Key Lab of Spatial Data Mining and Information Sharing of MOE, Spatial Information Research Center of Fujian, Fuzhou University, Fuzhou 350002, China;
    2. Fujian Economic Information Center, Fuzhou 350001, China
  • Received:2010-11-01 Revised:2011-03-29 Online:2011-06-25 Published:2011-06-15

摘要: 空间离群是指空间数据集中那些非空间属性值与邻域中其他空间对象明显不同的空间对象。空间数据一般按地理分布存储具有海量特性,传统的集中式处理模式不能满足海量数据处理的效率和空间数据本身的安全性等要求。因此,在研究小组开发的地理知识服务网格平台GeoKS-Grid的基础上,本文针对分布式空间离群挖掘,提出了一个基于网格的分布式体系框架,制定了网格环境下分布式空间离群挖掘的策略,实现了具体的分布式空间离群挖掘算法。另遵循分布式空间数据挖掘的一般过程和网格服务通用、可重用和可组合的原则,将算法按合理粒度进行分解,并封装成多个基本的原子服务,进而以网格工作流的方式进行服务发现与组合,完成包括局部离群挖掘和全局离群挖掘在内的分布式空间离群挖掘。最后,通过福建省生态地球化学调查土壤数据离群分析实例,验证了服务或系统的合理性和有效性。

关键词: 知识网格, 空间离群, 分布式挖掘, 原子服务, 服务组合

Abstract: A spatial outlier is a spatial object whose non-spatial attribute values are significantly deviated from the other data's in the dataset. The identification of spatial outliers can lead to the discovery of some unexpected knowledge, and it has a number of practical applications. There are massive spatial data maintained over geographically distributed sites in WAN. It's necessary to analyse and process the data by using the high-performance distributed parallel processing system. Grid is one of the most effective approaches to meet this requirement. The geographical knowledge grid platform (GeoKS-Grid) established by our research group is the application of knowledge grid in geo-information science, which integrate technologies of grid computing, web service, WebGIS, data mining, information visualization, knowledge base of ontology and knowledge reasoning, online analytical processing, decision analysis, data warehouse and workflow, to form a geographical problem solving environment. In this paper, a grid based distributed framework and the corresponding strategy for distributed spatial data mining system are discussed, and a distributed algorithm for spatial outlier mining is designed and implemented. In general, the process of distributed spatial outlier mining can be seen to be a series of services including atomic services and composite services. Furthermore, according to the principle of web service reusage and compositionality, the distributed spatial outlier mining algorithm is decomposed into several grid atomic services. Distributed spatial outlier mining including local spatial outlier mining and global spatial outlier mining is realized by grid workflow approach to discovery and composition of knowledge atomic grid services provided by knowledge grid. Finally, demonstration application is carried out on the basis of soil geochemistry data inspected by the Ecological Geochemistry Survey of Fujian Coastal Economic Belt, the efficiency and the validity of the distributed spatial outlier mining service and system are verified and confirmed.

Key words: spatial outlier, distributed data mining, knowledge grid, atomic service, service composition