地球信息科学学报 ›› 2022, Vol. 24 ›› Issue (1): 74-86.doi: 10.12082/dqxxkx.2022.210457
亢扬箫1,2(), 桂志鹏1,3,*(
), 丁劲宸1, 吴京航1, 吴华意3
收稿日期:
2021-08-04
修回日期:
2021-08-27
出版日期:
2022-01-25
发布日期:
2022-03-25
通讯作者:
* 桂志鹏(1982— ),男,宁夏吴忠人,博士,副教授,主要从事高性能地理计算与时空大数据分析相关研究。E-mail: zhipeng.gui@whu.edu.cn作者简介:
亢扬箫(1995— ),男,河南焦作人,硕士,主要从事时空大数据计算与应用相关研究。E-mail: kangyangxiao@foxmail.com
基金资助:
KANG Yangxiao1,2(), GUI Zhipeng1,3,*(
), DING Jinchen1, WU Jinghang1, WU Huayi3
Received:
2021-08-04
Revised:
2021-08-27
Online:
2022-01-25
Published:
2022-03-25
Supported by:
摘要:
作为二阶点模式分析方法,Ripley's K函数(简称K函数)以距离为自变量探测不同尺度下点事件的分布模式及演变规律,在生态学、经济学、地理学等诸多领域得到广泛应用。然而,随着点规模的增加,估计与模拟阶段点对距离遍历计算时间开销激增,严重制约了K函数的应用,算法流程优化与并行加速成为应对海量点数据下K函数性能瓶颈及可计算性问题的关键技术手段。针对默认数据分区未考虑点事件空间邻近性导致跨节点通讯成本高昂且K函数距离阈值较大时索引优化失效的现象,本文提出一种基于空间填充曲线的K函数优化加速方法。该方法采用Hilbert曲线构建空间分区,在顾及数据空间邻近性的前提下减少分区间数据倾斜和通讯开销;在分区基础上,利用Geohash编码改进各分区内本地空间索引策略加速点对距离计算。本文以湖北省工商企业注册数据为例,通过对比实验分析了默认分区无索引、KDB分区组合R树索引、本文Hilbert分区组合Geohash索引算法在不同数据规模、距离阈值、集群规模下的计算耗时。结果表明,300 000点数据规模下本文方法的时间开销约为默认分区无索引方法的1/4,9台节点下加速比超过3.6倍。因此,该方法能有效提升分布式环境下K函数计算性能并具有良好的可伸缩性,可为其他点模式分析方法的优化提供参考。
亢扬箫, 桂志鹏, 丁劲宸, 吴京航, 吴华意. 基于Hilbert空间分区和Geohash索引的并行Ripley's K函数[J]. 地球信息科学学报, 2022, 24(1): 74-86.DOI:10.12082/dqxxkx.2022.210457
KANG Yangxiao, GUI Zhipeng, DING Jinchen, WU Jinghang, WU Huayi. Parallel Ripley's K Function based on Hilbert Spatial Partitioning and Geohash Indexing[J]. Journal of Geo-information Science, 2022, 24(1): 74-86.DOI:10.12082/dqxxkx.2022.210457
[1] | 周成虎. 点模式分析[J]. 地理科学进展, 1989, 8(2):8-11. |
[ Zhou C H. Analysis of point patterns[J]. Progress in Geography, 1989, 8(2):8-11. ] DOI: 10.11820/dlkxjz.1989.02.003
doi: 10.11820/dlkxjz.1989.02.003 |
|
[2] |
Yamada I, Rogerson P. An empirical comparison of edge effect correction methods applied to K-function analysis[J]. Geographical Analysis, 2003, 35(2):97-109. DOI: 10.1111/j.1538-4632.2003.tb01103.x
doi: 10.1111/j.1538-4632.2003.tb01103.x |
[3] |
Ripley B D. Modelling spatial patterns[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1977, 39(2):172-212. DOI: 10.1111/j.2517-6161.1977.tb01615.x
doi: 10.1111/j.2517-6161.1977.tb01615.x |
[4] |
李智, 李卫红. 点模式条件下的犯罪嫌疑人时空同现模式挖掘与分析[J]. 地球信息科学学报, 2018, 20(6):827-836.
doi: 10.12082/dqxxkx.2018.180009 |
[ Li Z, Li W H. Mining and analyzing spatiotemporal co-occurrence patterns among criminal suspects under point pattern[J]. Journal of Geo-information Science, 2018, 20(6):827-836. ] DOI: 10.12082/dqxxkx.2018.180009
doi: 10.12082/dqxxkx.2018.180009 |
|
[5] |
曾璇, 崔海山, 刘毅华. 基于网络空间点模式的餐饮店空间格局分析[J]. 地球信息科学学报, 2018, 20(6):837-843.
doi: 10.12082/dqxxkx.2018.170596 |
[ Zeng X, Cui H S, Liu Y H. Analysis on spatial distribution characteristics of restaurant based on network spatial point model[J]. Journal of Geo-information Science, 2018, 20(6):837-843. ] DOI: 10.12082/dqxxkx.2018.170596
doi: 10.12082/dqxxkx.2018.170596 |
|
[6] | 李清泉, 李德仁. 大数据GIS[J]. 武汉大学学报·信息科学版, 2014, 39(6):641-644. |
[ Li Q Q, Li D R. Big data GIS[J]. Geomatics and Information Science of Wuhan University, 2014, 39(6):641-644. ] DOI: 10.13203/j.whugis20140150
doi: 10.13203/j.whugis20140150 |
|
[7] | 崔邹森. 基于交叉Ripley’s K函数的点模式Web可视化分析系统设计与实现[D]. 武汉:武汉大学, 2020. |
[ Cui Z S. Design and implementation of Web-based point pattern visual analytics system upon cross Ripley's K function[D]. Wuhan: Wuhan University, 2020. ] | |
[8] |
Tang W W, Feng W P, Jia M J. Massively parallel spatial point pattern analysis: Ripley's K function accelerated using graphics processing units[J]. International Journal of Geographical Information Science, 2015, 29(3):412-439. DOI: 10.1080/13658816.2014.976569
doi: 10.1080/13658816.2014.976569 |
[9] |
Zhang G M, Huang Q Y, Zhu A, et al. Enabling point pattern analysis on spatial big data using cloud computing: Optimizing and accelerating Ripley's K function[J]. International Journal of Geographical Information Science, 2016, 30(11):2230-2252. DOI: 10.1080/13658816.2016.1170836
doi: 10.1080/13658816.2016.1170836 |
[10] | 王源. 面向海量点模式分析的时空Ripley's K函数优化与加速[D]. 武汉:武汉大学, 2019. |
[ Wang Y. Optimization and acceleration of spatiotemporal Ripley's K function for enabling massive point pattern analysis[D]. Wuhan: Wuhan University, 2019. ] | |
[11] | 向隆刚, 王德浩, 龚健雅. 大规模轨迹数据的Geohash编码组织及高效范围查询[J]. 武汉大学学报·信息科学版, 2017, 42(1):21-27. |
[ Xiang L G, Wang D H, Gong J Y. Organization and efficient range query of large trajectory data based on geohash[J]. Geomatics and Information Science of Wuhan University, 2017, 42(1):21-27. ] DOI: 10.13203/j.whugis20150175
doi: 10.13203/j.whugis20150175 |
|
[12] |
Wang Y, Gui Z, Wu H, et al. Optimizing and accelerating space-time Ripley’s K function based on Apache Spark for distributed spatiotemporal point pattern analysis[J]. Future Generation Computer Systems, 2020, 105:96-118. DOI: 10.1016/j.future.2019.11.036
doi: 10.1016/j.future.2019.11.036 |
[13] |
Gui Z, Wang Y, Cui Z, et al. Developing apache spark based Ripley’s K functions for accelerating spatiotemporal point pattern analysis[J]. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020, 43:545-552. DOI: 10.5194/isprs-archives-XLIII-B4-2020-545-2020
doi: 10.5194/isprs-archives-XLIII-B4-2020-545-2020 |
[14] |
左尧, 王少华, 钟耳顺, 等. 高性能GIS研究进展及评述[J]. 地球信息科学学报, 2017, 19(4):437-446.
doi: 10.3724/SP.J.1047.2017.00437 |
[ Zuo Y, Wang S H, Zhong E S, et al. Research progress and review of high-performance GIS[J]. Journal of Geo-information Science, 2017, 19(4):437-446. ] DOI: 10.3969/j.issn.1560-8999.2017.04.001
doi: 10.3969/j.issn.1560-8999.2017.04.001 |
|
[15] |
邱强, 秦承志, 朱效民, 等. 全空间下并行矢量空间分析研究综述与展望[J]. 地球信息科学学报, 2017, 19(9):1217-1227.
doi: 10.3724/SP.J.1047.2017.01217 |
[ Qiu Q A, Qin C Z, Zhu X M, et al. Overview and prospect on spatial analysis of parallel vectors in pan-spatial concept[J]. Journal of Geo-Information Science, 2017, 19(9):1217-1227. ] DOI: 10.3724/SP.J.1047.2017.01217
doi: 10.3724/SP.J.1047.2017.01217 |
|
[16] |
Whitman R T, Park M B, Marsh B G, et al. Spatio-temporal join on Apache Spark[C]. Proceedings of the 25thACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2017:1-10. DOI: 10.1145/3139958.3139963
doi: 10.1145/3139958.3139963 |
[17] | 高旭, 桂志鹏, 隆玺, 等. KDSG-DBSCAN:一种基于K-D Tree和Spark GraphX的高性能DBSCAN算法[J]. 地理与地理信息科学, 2017, 33(6):1-7,封3. |
[ Gao X, Gui Z P, Long X, et al. KDSG-DBSCAN: A high performance DBSCAN algorithm based on K-D tree and spark GraphX[J]. Geography and Geo-Information Science, 2017, 33(6):1-7,封3. ] DOI: 10.3969/j.issn.1672-0504.2017.06.001
doi: 10.3969/j.issn.1672-0504.2017.06.001 |
|
[18] |
Gui Z, Peng D, Wu H, et al. MSGC: Multi-scale grid clustering by fusing analytical granularity and visual cognition for detecting hierarchical spatial patterns[J]. Future Generation Computer Systems, 2020, 112:1038-1056. DOI: 10.1016/j.future.2020.06.053
doi: 10.1016/j.future.2020.06.053 |
[19] |
Vo H, Aji A, Wang F S. SATO: A spatial data partitioning framework for scalable query processing[J]. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, 2014, 04/5/6/7:545-548. DOI: 10.1145/2666310.2666365
doi: 10.1145/2666310.2666365 |
[20] |
李绍俊, 钟耳顺, 王少华, 等. 基于状态转移矩阵的Hilbert码快速生成算法[J]. 地球信息科学学报, 2014, 16(6):846-851.
doi: 10.3724/SP.J.1047.2014.00846 |
[ Li J S, Zhong E S, Wang S H, et al. An algorithm for Hilbert ordering code based on state-transition matrix[J]. Journal of Geo-information Science, 2014, 16(6):846-851.] DOI: 10.3724/SP.J.1047.2014.00846
doi: 10.3724/SP.J.1047.2014.00846 |
|
[21] | 崔登吉. 空间分布模式驱动的空间数据组织与索引研究[D]. 南京:南京师范大学, 2016. |
[ Cui D J. Spatial data organization and indexing research driven by spatial distribution pattern[D]. Nanjing: Nanjing Normal University, 2016. ] | |
[22] |
Liu Y Y, Cho W K T. A spatially explicit evolutionary algorithm for the spatial partitioning problem[J]. Applied Soft Computing, 2020, 90:106-129. DOI: 10.1016/j.asoc.2020.106129
doi: 10.1016/j.asoc.2020.106129 |
[23] | 向隆刚, 高萌, 王德浩, 等. Geohash-Trees:一种用于组织大规模轨迹的自适应索引[J]. 武汉大学学报·信息科学版, 2019, 44(3):436-442. |
[ Xiang L G, Gao M, Wang D H, et al. Geohash-trees: An adaptive index which can organize large-scale trajectories[J]. Geomatics and Information Science of Wuhan University, 2019, 44(3):436-442. ] DOI: 10.13203/j.whugis20160523
doi: 10.13203/j.whugis20160523 |
|
[24] |
Zhang L, Ren L, Hao X, et al. Query method for nearest region of spatial line segment based on Hilbert curve grid[J]. International Journal of Innovative Computing Information and Control, 2019, 15(4):1287-1307. DOI: 10.24507/ijicic.15.04.1287
doi: 10.24507/ijicic.15.04.1287 |
[25] |
Yu J, Zhang Z S, Sarwat M. Spatial data management in apache spark: The GeoSpark perspective and beyond[J]. GeoInformatica, 2019, 23(1):37-78. DOI: 10.1007/s10707-018-0330-9
doi: 10.1007/s10707-018-0330-9 |
[26] |
Li F, Gui Z, Wu H, et al. Big enterprise registration data imputation: Supporting spatiotemporal analysis of industries in China[J]. Computers, Environment and Urban Systems, 2018, 70:9-23. DOI: 10.1016/j.compenvurbsys.2018.01.010
doi: 10.1016/j.compenvurbsys.2018.01.010 |
[27] | 湛东升, 张文忠, 党云晓, 等. 北京市公共服务设施空间集聚特征分析[J]. 经济地理, 2018, 38(12):76-82. |
[ Zhan D S, Zhang W Z, Dang Y X, et al. Spatial clustering analysis of public service facilities in Beijing[J]. Economic Geography, 2018, 38(12):76-82. ] DOI: 10.15957/j.cnki.jjdl.2018.12.010
doi: 10.15957/j.cnki.jjdl.2018.12.010 |
|
[28] |
Eldawy A, Alarabi L, Mokbel M F. Spatial partitioning techniques in SpatialHadoop[J]. Proceedings of the VLDB Endowment, 2015, 8(12):1602-1605. DOI: 10.14778/2824032.2824057
doi: 10.14778/2824032.2824057 |
[29] |
Kui Y, Chang Y Q. Prediction and optimization of sharing bikes queuing model in grid of Geohash coding[J]. Measurement and Control, 2020, 53(7):1250-1266. DOI: 10.1177/0020294019877521
doi: 10.1177/0020294019877521 |
[30] | Google. S2 Geometry[EB/OL]. http://s2geometry.io/, 2019-12-18. |
[31] |
Li R, He H, Wang R, et al. JUST: JD urban spatio-temporal data engine[C]. 2020 IEEE 36th International Conference on Data Engineering, 2020:1558-1569. DOI: 10.1109/ICDE48307.2020.00138
doi: 10.1109/ICDE48307.2020.00138 |
[1] | 李绍俊, 钟耳顺, 王少华, 张珣. 基于状态转移矩阵的Hilbert码快速生成算法[J]. 地球信息科学学报, 2014, 16(6): 846-851. |
[2] | 杨海平, 沈占锋, 骆剑承, 吴炜. 海量遥感数据的高性能地学计算应用与发展分析[J]. 地球信息科学学报, 2013, 15(1): 128-136. |
[3] | 宋亚超, 闾国年, 张宏. 基于WebService的Internet GIS集成与应用[J]. 地球信息科学学报, 2004, 6(1): 44-48. |
[4] | 刘纯波, 李琦, 承继成. 基于XML-RPC的分布式地理信息系统计算模型[J]. 地球信息科学学报, 2003, 5(1): 34-38. |
|