一种逐级合并OD流向时空联合聚类算法
项秋亮(1995— ),男,安徽黄山人,硕士生,研究方向为空间数据挖掘。E-mail: qiuliangxiang@outlook.com |
收稿日期: 2019-06-03
要求修回日期: 2020-02-13
网络出版日期: 2020-08-25
基金资助
国家自然科学基金项目(41471333)
中央引导地方科技发展专项项目(2017L3012)
版权
An OD Flow Spatio-temporal Joint Clustering Algorithm based on Step-by-step Merge Strategy
Received date: 2019-06-03
Request revised date: 2020-02-13
Online published: 2020-08-25
Supported by
National Natural Science Foundation of China(41471333)
The Central Guided Local Development of Science and Technology Project(2017L3012)
Copyright
现有OD流向聚类多将O点和D点相分离或者将OD流向看作4维空间的数据点进行聚类处理,忽视了流向长度、方向、时间对流向聚类的影响。本文以流向作为研究对象,提出一种基于流向间相似性度的逐级合并OD流向时空联合聚类算法。首先在充分研究OD流向的空间信息和时间信息的基础上,构建合理的OD流向间时空相似性度量方法,对OD流向间的时空相似性进行量化;然后提出逐级合并OD流向聚类策略,优化类簇合并的顺序,以减少层次聚类的时间开销,实现OD流向的时空联合聚类。以成都市的滴滴出行OD数据和纽约市出租车数据为例对本文方法进行了验证,结果表明:① 本算法聚类获得的流向类簇不仅带有空间特征还具备时间特征;② 在不同参数下本方法可以得到不同时空尺度的聚类结果;③ 与现有较高水平的流向聚类算法相对比,本文方法的聚类效果更好。这体现在流向类簇内部的流向之间有着充分的相似性,以及本文方法不仅可以提取出显著的流向类簇,还可以提取出非热点区域之间的流向类簇。本算法顾及空间因素和时间因素,可以通过调整时空相似性度量方法中的时间参数和空间参数以实现不同时空尺度的流向聚类,这使得从不同时空角度研究城市居民出行模式成为可能。本文提出的OD流向时空联合聚类算法从联合时间信息和空间信息的角度获得对运动数据的新见解,有助于合理全面地研究居民的移动模式、区域之间的空间联系、已知出行结构的确定以及出行目的的探索,是后续一系列分析工作的基础。
项秋亮 , 邬群勇 , 张良盼 . 一种逐级合并OD流向时空联合聚类算法[J]. 地球信息科学学报, 2020 , 22(6) : 1394 -1405 . DOI: 10.12082/dqxxkx.2020.190276
Most of the existing OD flow clustering methods adopt the strategy of dividing the OD flow into O point and D point or considering flow as the four-dimensional point to implement flow clustering, which ignores the effects caused by the length, direction and time information on the clustering process. In this paper, we proposedabrand-new spatio-temporal flow clustering method based on the similarity between flows with a strategy of merging flow clusters under different grading. Firstly, a reasonablespatio-temporal similarity measurement formula of OD flow was constructed to quantify the spatio-temporal similarity between OD flows on the basis of full stydy of OD flow's spatial information and temporal information. Then, with the purpose of optimizing the order of merging flow clusters, reducing the time consumption of clustering process, a strategy of merging flow clusters under different grading was used to complete flow clustering. In this method, both of time information and spatial information weretaken into consideration. By modifying the parameters of the spatio-temporal similarity measurement formula, our method can obtain clustering results for different time scales and spatial scales, which makes it possible to analyze the movement patterns from a multi-scale perspective. To verify the effective of our method, a series of experiments on real dataset was executed. The clustering results demonstrate that: ①flow clusters discovered by our method not only hadspatial characteristic but also hadtemporal characteristic; ② our method can discover different spatio-temporal OD flow cluster under different spatio-temporal parameters; ③ by comparingthe clustering results of our method with previous work of advanced technology level, it turnedout that our method hada better clustering performance, which was reflected in the fact that flows within the same flow cluster satisfied the similarity relationship and our method can not only find the obvious movements patterns but also capture inconspicuous movements patterns between non-hot zones. Thespatio-temporal joint OD flow clustering method proposed in this paper obtains new insights into motion from the perspective of joint temporal and spatial information, which is conducive to a reasonable and comprehensive study of residents' movement patterns, spatial linkage between regions, the determination of the known travel structure, and the exploration of the purpose of travel. The process of OD flow clutsering is the beginning of a series of subsequent analysis.
表1 模拟数据的OD流向之间的相似性数值Tab. 1 The similarity value between the OD flows of the synthesized sample data |
flow1 | flow2 | 0.85 | flow2 | flow6 | <0 |
flow1 | flow3 | 0.55 | flow3 | flow4 | 0.30 |
flow1 | flow4 | 0.10 | flow3 | flow5 | 0.10 |
flow1 | flow5 | <0 | flow3 | flow6 | 0.05 |
flow1 | flow6 | <0 | flow4 | flow5 | 0.55 |
flow2 | flow3 | 0.60 | flow4 | flow6 | 0.50 |
flow2 | flow4 | 0.15 | flow5 | flow6 | 0.80 |
flow2 | flow5 | <0 |
算法:逐级合并策略的OD流向聚类 |
---|
输入:OD流向数据,时间参数,距离参数,多等级高相似参数,多等级类簇间高相似度 输出:OD流向类簇 function FLOW_CLUSTER() step 1: //将每条OD流向组织成一个流向类簇 , step 2: for in do for in do step 2.1: //计算高相似参数下所有类簇组合的类簇间高相似度 Calculate step 2.1: //计算高相似参数下所有类簇组合的类簇间高相似度 Calculate step 2.1: //寻找满足所有合并条件的类簇组合并合并 if and do update C end if end for end for end function |
[1] |
|
[2] |
|
[3] |
|
[4] |
王祖超, 袁晓如. 轨迹数据可视分析研究[J]. 计算机辅助设计与图形学学报, 2015,27(1):9-25.
[
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
邬群勇, 张良盼, 吴祖飞. 顾及空间异质性的出租载客与公交客流回归分析[J]. 地球信息科学学报, 2019,21(3):337-345.
[
|
[11] |
邬群勇, 苏克云, 邹智杰. 基于MapReduce的海量公交乘客OD并行推算方法[J]. 地球信息科学学报, 2018,20(5):647-655.
[
|
[12] |
信睿, 艾廷华, 杨伟, 等. 顾及出租车OD点分布密度的空间Voronoi剖分算法及OD流可视化分析[J]. 地球信息科学学报, 2015,17(10):1187-1195.
[
|
[13] |
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
盖亚开发数据计划:https://outreach.didichuxing.com/research/opendata/DB/OL].
[ GAIA Open Dataset:https://outreach.didichuxing.com/research/opendata/[DB/OL]. ]
|
/
〈 | 〉 |