地球信息科学学报 ›› 2015, Vol. 17 ›› Issue (7): 837-845.doi: 10.3724/SP.J.1047.2015.00837

• • 上一篇    下一篇

多时间尺度密度聚类算法的案事件分析应用

吴文浩(), 吴升*()   

  1. 福州大学福建省空间信息工程研究中心 空间数据挖掘与信息共享教育部重点实验室,福州 350002
  • 收稿日期:2014-12-09 修回日期:2015-03-16 出版日期:2015-12-10 发布日期:2015-07-08
  • 通讯作者: 吴升 E-mail:wwh1295@gmail.com;ws0110@163.com
  • 作者简介:

    作者简介:吴文浩(1989-),男,江苏如皋人,硕士生,研究方向为时空数据挖掘、空间信息网络共享与服务。E-mail: wwh1295@gmail.com

  • 基金资助:
    国家“863”计划重大项目课题(2012AA12A208)

Application of Density-Based Clustering Algorithm in Crime Cases Analysis Considering Multiple Time Scale

WU Wenhao(), WU Sheng*()   

  1. Spatial Information Research Center of Fujian Province, Fuzhou University, Key Laboratory of Spatial Data Mining & Information Sharing, Ministry of Education, Fuzhou 350002, China
  • Received:2014-12-09 Revised:2015-03-16 Online:2015-12-10 Published:2015-07-08
  • Contact: WU Sheng E-mail:wwh1295@gmail.com;ws0110@163.com
  • About author:

    *The author: SHEN Jingwei, E-mail:jingweigis@163.com

摘要:

时空聚类是数据挖掘研究的主要内容之一,在环境保护、疾病预防与控制、犯罪预防与打击等领域具有重要的应用价值。已有的时空聚类方法中,时间“距离”都认为是真实的间隔,而对于具有社会属性的案事件而言,其在不同时间尺度下具有明显的周期性特征,忽略这些特征将很难反映出案事件真实的时空规律。本文综合考虑多时间尺度下的时间属性,构建等效时空邻近域,并借鉴经典的密度聚类算法,提出了多时间尺度等效时空邻近域密度聚类算法(MTS-ESTN DBSCAN)。通过对福州市区2013年案事件数据的聚类分析表明,该方法在案事件时空聚类方面具有可行性,对于进一步深入研究城市犯罪地理具有一定的理论意义和实际价值。

关键词: 时空聚类, 多时间尺度, 密度聚类, 案事件

Abstract:

Space-time clustering, which is one of the main research focuses in the field of data mining, has important application values in the field of environment protection, disease prevention and control, and crime prevention and combat. The time "distance" is considered to be a substantial interval within the existing space-time clustering methods. However, crime cases with social attributes have obvious cyclical characteristics in different time-scales. It would be difficult to find the real rules of time and space for crime cases if these characteristics are ignored. Therefore, based on DBSCAN, an algorithm considering multiple time-scales and equivalent spatio-temporal neighborhood (MTS-ESTN DBSCAN) was put forward. In this algorithm, the various time attributes in multiple time-scales were considered, the equivalent spatio-temporal neighborhood was built, and the concept of the classical density clustering algorithm was cited. In the equivalent spatio-temporal neighborhood, the Euclidean distance (L2-norm) is adopted as the measurement of spatial neighborhood for the space domain. With the improved function of HDsim, which is a method used to measure the unified similarity of high dimensional data, we defined the similarity of time domain. Based on the crime cases data in the urban area of Fuzhou city during 2013, cluster analysis was conducted, and the resultant clustering quality was evaluated using several indicators such as CH (Calinski-Harabasz), Sil (Silhouette), DB (Davies-Bouldin) and KL (Krzanowski-Lai). The results showed the feasibility of the method in space-time cluster analysis of crime cases. Compared with the traditional algorithm of ST-DBSCAN, this algorithm has produced better quality of clustering. In addition, this algorithm can find the accumulation characteristics behind the rules of human´s work, rest and other social activities in a long period. It has certain significances and application values for the advanced study of criminal geography in urban area.

Key words: space-time clustering, multiple time scale, density-based clustering, crime cases