地球信息科学学报 ›› 2018, Vol. 20 ›› Issue (1): 28-36.doi: 10.12082/dqxxkx.2018.170266

• 地球信息科学理论与方法 • 上一篇    下一篇

基于资源环境数据格网化表达的关联模式发现

徐振(), 荆耀栋, 毕如田*(), 高阳, 王鹏   

  1. 山西农业大学资源环境学院, 太谷030801
  • 收稿日期:2017-06-13 修回日期:2017-09-23 出版日期:2018-01-20 发布日期:2018-01-20
  • 通讯作者: 毕如田 E-mail:384989273@qq.com;brt@sxau.edu.cn
  • 作者简介:

    作者简介:徐 振(1991-),男,山东临沂人,硕士生,研究方向为3S技术。E-mail: 384989273@qq.com

  • 基金资助:
    国土资源部公益性行业项目(201411007)

The Discovery of Spatial Association Patterns of Resource and Environment Information Based on Grid Data

XU Zhen(), JING Yaodong, BI Rutian*(), GAO Yang, WANG Peng   

  1. College of Resources and Environment, Shanxi Agricultural University, Taigu 030801, China
  • Received:2017-06-13 Revised:2017-09-23 Online:2018-01-20 Published:2018-01-20
  • Contact: BI Rutian E-mail:384989273@qq.com;brt@sxau.edu.cn
  • Supported by:
    Foundation item: Public Welfare Profession Project of Ministry of Land and Resources of the People's Republic of China, No.201411007.

摘要:

传统空间关联模式以空间谓词作为发现逻辑进行知识发现,会导致关联模式侧重空间位置关联,并且挖掘结果受所建立谓词表的限制,存在所发现模式固定、解释自由度差等问题。本文提出一种不依赖于空间谓词的关联模式发现方法,该方法将空间数据进行格网化表达,对格网化结果以平滑移动的N×N掩膜进行多约束事务化,将传统Apriori算法去除属性自连接,然后对所构建的空间事务化数据库进行关联模式探索,抽取有价值的关联模式。最后,以山西省晋城市长河流域为实证研究区,建立煤、地、水空间事务数据库,给出格网化表达的定量误差,探索其隐含空间关联模式,并以同位模式验证了事务化结果的精度。格网化生成覆盖研究区的64 m格网28 434个,各数据层格网化误差均在5%以内,以耕地为主因子事务化结果共有记录38 310条记录。对抽取的部分关联模式分析表明:发现结果符合长河流域矿农复合区背景下耕地相关的先验知识;该方法能有效提取空间数据及其属性信息中潜在的关联模式,提高了挖掘过程自由度和结果的兴趣度。

关键词: 空间数据挖掘, 格网数据, 空间关联模式, Apriori算法

Abstract:

Spatial association patterns include location patterns of spatial association which emphasize on spatial data and structure patterns of the spatial association, which emphasize on attribute data. However, traditional methods were based on traditional spatial data and used spatial predicates as the logic in the process of mining. This would lead to the following problems: Firstly, it relied on the boundaries of spatial phenomenon and didn’t take account in the area of spatial phenomenon. Secondly, the results were restricted strongly by the table of spatial predicate built before data mining. Based on The Tobler’s First Law of Geography, this research proposed a new method of extracting spatial association patterns without using spatial predicate. According to specific data content and data format, this method converted spatial data into grid data which has the same spatial coordinate and the same size of each grid. Then, the method used a smooth moving-mask to get the transaction database from the grid data. Apriori algorithm without self-connection of attributes was adopted to explore the latent association patterns in transaction database. Finally, an experiment was conducted to verify the accuracy of this method. The experiment data included the data of coal mining area, land use data, water system data and terrain data in Changhe basin of Jincheng City in Shanxi Province. In the experiment, the error of grid transformation of each data layer was controlled within 5% and the accuracy of transaction was verified in co-location pattern. Grid transformation generated 28 434 grids and the size of each grid was 64 meters. After setting cultivated land as main factor, there were 38 310 records in transaction database. Through the study on some association patterns with higher confidence, it showed that the results were consistent with the prior knowledge related to cultivated land in ore-agricultural area. Therefore, this method can effectively extract the meaning association patterns and improve the interestingness of the results. This method improves the degree of freedom of the data mining by setting different sizes of the grid, main factors and mask sizes. Based on grid data instead of traditional spatial data, this method doesn’t rely on the boundaries of spatial phenomenon and takes into account the area factor.

Key words: spatial data mining, grid data, spatial association patterns, Apriori algorithm