Journal of Geo-information Science >
A Spatial Point Data Grouping Algorithm Based on the N-KD Tree
Received date: 2014-02-14
Request revised date: 2014-07-19
Online published: 2015-01-05
Copyright
The large amount of calculations with a continuously high-speed growth required in current development of computational technology presents a great challenge to the research of spatial data processing in geography. To accelerate the processing speed of constantly updated real-time data that distributed unevenly, this study developed a dynamic grouping method based on the Number-K Dimension (N-KD) Tree technique. The method employs a data parallel algorithm to preserve spatial proximity and balance the data loads after grouping the real-time data that vary both in quantity and spatial domain. Based on the KD Tree creation, the algorithm measures the data sparsity according to the mathematic variance, addresses the point data near boundaries according to the nearest searching approach, and achieve an unequal data partition with respect to space while having balanced data loads. Experiments reveal that the method is capable of efficiently grouping the point data that have various spatial distributions and balancing the data amount among these groups. The algorithm also supports parallel computing and distributed collaborative working model, which highlights the practical values in its applications.
Key words: point data; discreteness; N-KD Tree; K-D Tree; dynamic grouping; data parallel
WEI Haitao , DU Yunyan , REN Haowei , LIU Zhang , YI Jiawei , XU Kaihui . A Spatial Point Data Grouping Algorithm Based on the N-KD Tree[J]. Journal of Geo-information Science, 2015 , 17(1) : 1 -7 . DOI: 10.3724/SP.J.1047.2015.00001
Fig. 1 Flow chart of spatial data partition图1 空间数据分割算法流程图 |
Fig. 2 Multi-source ocean buoy data visualization of the South China Sea图2 中国南海多源海洋浮标数据可视化图 |
Tab. 1 Test sample plots表 1 测试实例图组表 |
Tab. 2 Time consuming and number of data points in each group regarding to different data quantity and different partition number表 2 不同数据量、不同分组的用时及数据均衡情况 |
点集数据量(个) | 不同数据量在各分组情况下的耗时(s)及数据量(个) | ||||||
---|---|---|---|---|---|---|---|
4 | 5 | 6 | 7 | 8 | 9 | 10 | |
212 280 | 27.97 (53 074) | 28.07 (42 447) | 28.49 (35 389) | 29.84 (30 319) | 30.29 (26 530) | 31.01 (23 586) | 31.54 (21 229) |
118 417 | 14.58 (29 595) | 17.84 (23 677) | 20.57 (19 728) | 21.06 (16 908) | 21.06 (14 795) | 21.72 (13 136) | 23.04 (11 839) |
53 074 | 5.94 (13 257) | 6.94 (10 607) | 7.74 (8 841) | 8.42 (7 570) | 8.88 (6 625) | 9.42 (5 891) | 9.54 (5 298) |
Fig. 3 Time consuming in different data quantity and different partition number图3 不同数据量、不同分组的用时折线图 |
Fig. 4 Test sample comparison图4 测试对比图 |
Tab. 3 The number of data in each group表3 3种方法在不同的分组数下各个分组对应的数据量 |
组号 | 分为4组 | 分为6组 | ||||
---|---|---|---|---|---|---|
空间位置均分法 | 传统K-D树分组方法 | N-KD树空间分组方法 | 空间位置均分法 | 传统K-D树分组方法 | N-KD树空间分组方法 | |
1 | 30 058 | 53 070 | 53 069 | 25 028 | 35 380 | 35 382 |
2 | 21 370 | 53 071 | 53 067 | 114 207 | 35 379 | 35 376 |
3 | 36 046 | 53 070 | 53 074 | 21 976 | 35 382 | 35 374 |
4 | 124 805 | 53 073 | 53 074 | 17 052 | 35 380 | 35 388 |
5 | 19 103 | 35 383 | 35 373 | |||
6 | 14 914 | 35 383 | 35 393 |
Fig. 5 The comparison figure of data load balance图5 数据负载均衡对比图 |
Fig. 6 Time consuming comparisons among the three algorithms图6 3种方法的耗时对比图 |
The authors have declared that no competing interests exist.
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
/
〈 | 〉 |