一种基于空间-拓扑结构相似性的复杂轨迹聚类算法
孙 勇(1986-),男,江西吉安人,博士生,研究方向为地理信息系统开发与应用,数据挖掘。 E-mail: ttsunyong@163.com |
收稿日期: 2019-07-15
要求修回日期: 2019-08-12
网络出版日期: 2019-12-11
基金资助
国家自然科学基金项目(No.41671445)
国家自然科学基金项目(No.41471330)
版权
Complex Trajectory Clustering based on a Spatial-Topological Similarity Measurement
Received date: 2019-07-15
Request revised date: 2019-08-12
Online published: 2019-12-11
Supported by
National Science Foundation of China(No.41671445)
National Science Foundation of China(No.41471330)
Copyright
复杂的面状空间实体如海洋涡旋、环流和降雨过程在运动过程中会产生更复杂的轨迹,即具有分支结构的复杂轨迹。为了挖掘这类复杂轨迹的运动模式特征,本文从复杂轨迹的拓扑结构和空间特征出发,创新性地提出复杂轨迹的空间-拓扑结构相似性度量算法(Spatial-Topological Similarity Measurement, STSM),该算法是基于图同构算法VF2改进的。首先STSM算法将复杂轨迹用带有节点和边的图结构表达,并将空间信息融入图结构的节点属性中,通过匹配复杂轨迹之间所有最大公共子结构,找到匹配结构中节点之间一一对应的关系,利用加权的欧式距离计算复杂轨迹匹配结构中点对之间的空间距离。然后,基于STSM相似性算法进行层次聚类分析,旨在发现复杂轨迹之间相似的拓扑结构在空间上的聚集模式。最后,利用1993-2016年长时间序列的中国南海冷涡复杂轨迹验证方法的有效性,并对比分析复杂轨迹拓扑结构相似性算法CSM。结果表明:单纯用拓扑结构相似性算法CSM进行聚类分析,不能充分挖掘空间的聚集模式,因为不同空间位置也存在拓扑结构相似的轨迹。而本文提出的STSM算法将南海冷涡复杂轨迹分为5类,第一类分布在南海北部、第二类分布在南海中部、其他三类交错在南海南部。这种聚集模式在一定程度上反映了冷涡的生成和演化过程在南海北部、中部、南部的差异性,同时也表明了冷涡移动在南海南部存在更为复杂的异质性。因此,本文提出的方法可以有效地从复杂轨迹数据中发现其演化过程的潜在聚集模式,为认识这类复杂动态现象的时空演化特征提供了一种新的方法。
孙勇 , 王会蒙 , 靳奉祥 , 杜云艳 , 季民 , 易嘉伟 . 一种基于空间-拓扑结构相似性的复杂轨迹聚类算法[J]. 地球信息科学学报, 2019 , 21(11) : 1669 -1678 . DOI: 10.12082/dqxxkx.2019.190375
Complex spatial entities such as ocean eddies, circulation, and rainfall processes that can move produce much more complex movement data, namely, complex trajectories. Complex trajectories have nonlinear structures and bear at least one split and/or merger branch. To mine the motion pattern of such complex trajectories, this paper proposed a Spatial-Topological Similarity Measurement (STSM) method based on the topological structure and spatial characteristics of complex trajectories. The STSM method was inspired by the graph isomorphism algorithm VF2. Firstly, each complex trajectory was represented by a graph structure with nodes and edges, which integrates the spatial coordinates of trajectory points into node attributes. By matching all maximal common substructures between the complex trajectories, one-to-one correspondence among the nodes in the matching structure was determined, The weighted Euclidean distance was then used to calculate the spatial similarity between points in the matched structure of the complex trajectories. Secondly, the average-linkage agglomerative hierarchical clustering analysis was carried out based on the proposed STSM algorithm, aiming at discovering any spatial clustering pattern of similar topological structures between complex trajectories. Finally, the effectiveness of the proposed method was verified by using the long-time series of the complex trajectories of cyclonic eddies in the South China Sea (SCS) from 1993 to 2016. The topological structure similarity algorithm CSM (Comprehensive Structure Matching) for complex trajectories was also compared and analyzed. Results show that clustering analysis based on the CSM algorithm can not fully mine spatial aggregation patterns of the cyclonic eddy complex trajectories, because complex trajectories with similar topological structures could exist in different regions. The STSM algorithm classified the complex trajectories of cyclonic eddies in the SCS into five clusters. Cluster 1 was in the north of the SCS, cluster 2 was in the central part of the SCS, and the other three clusters were interlaced in the south of the SCS. To a certain extent, this aggregation model not only reflected the differences of the formation and evolution of cyclonic eddies in the northern, central, and southern SCS, but also indicated that the movement of cyclonic eddies in the southern SCS had more complex heterogeneity than other regions of the SCS. Our findings suggest that the proposed method STSM can help discover effectively from the complex trajectory data the potential aggregation patterns of evolution processes, and provide a new method for revealing the spatiotemporal characteristics of such complex dynamic phenomena.
图5 基于复杂轨迹相似性算法CSM和STSM的层次聚类树状图注:红色虚线为轮廓系数的剪切线。 Fig. 5 Hierarchical clustering tree maps based on the two complex trajectory similarity algorithms and their shear line based silhouette coefficients (red dashed lines): CSM versus STSM |
表1 基于相似性算法STSM的聚类结果的拓扑结构模式和活动参数统计分析Tab. 1 Topological patterns and corresponding activity parameters of the clustering result based on the STSM similarity algorithm |
类别 | 所在区域及轨迹颜色(图5(b)) | 复杂轨迹数目 | 主要演化结构模式及占比 | 活动范围/km | 活动天数/d | 轨迹复杂度 |
---|---|---|---|---|---|---|
第一类 | 南海北部(蓝色) | 149 | vms(20.13%) | 280.59 | 59 | 0.45 |
第二类 | 南海中部(绿色) | 358 | smsv(25.91%) | 318.86 | 75 | 0.47 |
第三类 | 南海南部(紫色) | 159 | vmsv(33.33%) | 286.93 | 67 | 0.48 |
第四类 | 南海南部(橘色) | 47 | vmsv(52.08%) | 192.04 | 40 | 0.31 |
第五类 | 南海南部(青色) | 17 | vmsmv(64.71%) | 255.18 | 90 | 0.51 |
[1] |
李婷, 裴稻, 袁烨城 , 等. 人类活动轨迹的分类、模式和应用研究综述[J]. 地理科学进展, 2014,33(7):938-948.
[
|
[2] |
刘瑜, 康朝贵, 王法辉 . 大数据驱动的人类移动模式和模型研究[J]. 武汉大学学报·信息科学版, 2014,39(6):660-666.
[
|
[3] |
宋辞, 裴韬 . 北京市多尺度中心特征识别与群聚模式发现[J]. 地球信息科学学报, 2019,21(3):384-397.
[
|
[4] |
龚玺, 裴韬, 孙嘉 , 等. 时空轨迹聚类方法研究进展[J]. 地理科学进展, 2011,30(5):522-534.
[
|
[5] |
郑宇 . 城市计算概述[J]. 武汉大学学报·信息科学版, 2015,40(1):1-12.
[
|
[6] |
|
[7] |
|
[8] |
牟乃夏, 徐玉静, 张恒才 , 等. 移动轨迹聚类方法研究综述[J].测绘通报, 2018(1):1-7.
[
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
郭文月, 刘海砚, 孙群 , 等. 面向区域增量更新的等高线群混合相似性度量模型[J]. 地球信息科学学报, 2019,21(2):147-156.
[
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
|
[26] |
|
[27] |
|
[28] |
|
[29] |
|
[30] |
吴笛, 杜云艳, 易嘉伟 , 等. 基于密度的轨迹时空聚类分析[J]. 地球信息科学学报, 2015,17(10):1162-1171.
[
|
[31] |
杜云艳, 莫洋, 王会蒙 , 等. 基于复杂网络的海洋涡旋移动特征研究[J]. 海洋学报, 2017,39(7):110-123.
[
|
/
〈 | 〉 |