A Line Grouping Algorithm Based on Density

  • 1. Shandong University of Science and Technology, Qingdao 266510, China;
    2. LREIS, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China

Received date: 2014-12-26

  Revised date: 2015-02-03

  Online published: 2015-05-10


Parallel computing provides a promising solution to accelerate complicated spatial data processing, which is becoming increasingly computational intense. Partitioning large datasets into workload-balanced subgroups remains a challenge, particularly for unevenly distributed spatial data. In this study, a density-based data grouping algorithm was developed to tackle the partition problem for large line data. The algorithm includes three procedures: (1) extracting representative segment samples based on data density distribution; (2) generating a distance matrix between segment samples and the rest of the data by using three line distance measurements into calculations; (3) grouping line segments with data load balanced. Experiments show that the algorithm is able to partition large line data efficiently and evenly into equally sized sub-groups. The speed-up ratios of parallel interpolation save up to 65% of the execution time in comparison with consequential interpolation. A high efficiency of parallel computing was achieved when the datasets were divided into an optimal number of child data groups.

Cite this article

WEI Haitao, DU Yunyan, XU Kaihui, WU Di, YI Jiawei, MO Yang, LIU Zhang . A Line Grouping Algorithm Based on Density[J]. Journal of Geo-information Science, 2015 , 17(5) : 538 -546 . DOI: 10.3724/SP.J.1047.2015.00538


[1] Guan Q, Clarke K C. A general-purpose parallel raster processing programming library test application using a geographic cellular automata model[J]. International Jour-nal of Geographical Information Science, 2010,24(5):695-722.
[2] Farber R. CUDA application design and development[M]. Waltham, MA: Elsevier Inc, 2011:18-19.
[3] Kim Y, Shim K, Kim M S, et al. DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce[J]. Information Systems, 2014,42:15-35.
[4] Li L, Xi Y. Research on clustering algorithm and its parallelization strategy[C]. 2011 International Conference on Computational and Information Sciences (ICCIS), 2011: 325-328.
[5] He Y, Tan H, Luo W, et al. An efficient parallel densitybased clustering algorithm using MapReduce[C]. 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), 2011:473-480.
[6] Abugov D. Oracle spatial partitioning: Best practices (an Oracle white paper)[M]. Redwood, CA: Oracle Inc, 2004.
[7] Meng L, Huang C, Zhao C, et al. An improved Hilbert curve for parallel spatial data partitioning[J]. Geo-spatial Information Science, 2007,10(4):282-286.
[8] Florida-San Román L. Proposition of two layered ionic structures, with xy disorder but z-ordered, in a quasi-liquid system[J]. Revista mexicana de física, 2006,52:208-210.
[9] Sarwat M, Mokbel M F, Zhou X, et al. Generic and efficient framework for search trees on flash memory storage systems[J]. GeoInformatica, 2013,17(3):417-448.
[10] Lee J G, Han J, Whang K Y. Trajectory clustering: A partition-and-group framework[C]. Proceedings of the 2007 ACM SIGMOD international conference on Management of data, 2007:593-694.
[11] Jonk A, Smeulders AW. An axiomatic approach to clustering line-segments[C]. Proceedings of the Third International Conference on Document Analysis and Recognition, 1995:386-389.
[12] Kelly A R, Hancock E R. Grouping-line segments using eigenclustering[C]. Proceedings of the British Machine Vision Conference, 2000:1-10.
[13] Thomas J C R. A new clustering algorithm based on kmeans using a line segment as prototype[C]. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 2011:638-645.
[14] Gaffney S J, Robertson A W, Smyth P, et al. Probabilistic clustering of extratropical cyclones using regression mixture models[J]. Climate dynamics, 2007,29(4):423-440.
[15] Vinh N N, Le B. Simple spatial clustering algorithm based on R-tree[J]. Multi-disciplinary Trends in Artificial Intelligence,2012:236-245.
[16] Wang S, Armstrong M P. A quadtree approach to domain decomposition for spatial interpolation in grid computing environments[J]. Parallel Computing, 2003,29(10):1481-1504.
[17] 赵春宇, 孟令奎, 林志勇. 一种面向并行空间数据库的数 据划分算法研究[J]. 武汉大学学报:信息科学版,2006,31 (11):962-965.
[18] Guttman A. R-trees: A dynamic index structure for spatial searching[J]. ACM, 1984(14):47-57.
[19] Jacob C, Maheswaran G, Santosh Kumar I. Clustering in Hilbert R trees: A study on spatial indexing in R trees[J]. Spatial Indexing using R Trees, 2013:1-3.
[20] Marshall P L, Region V F. Using line intersect sampling for coarse woody debris[R]. Nanaimo, Canada: Vancouver Forest Region, 2000:112-121.
[21] Thompson S K. Line-intercept sampling[A]. In: Sampling
[M]. Hoboken, NJ: John Wiley & Sons Inc, 2012:275-282.
[22] Fischer F, Kleinn C, Fehrmann L, et al. A national level forest resource assessment for Burkina Faso -a field based forest inventory in a semiarid environment combining small sample size with large observation plots[J]. Forest Ecol. Manage, 2011,262(8):1532-1540.
[23] Fehrmann L, Seidel D, Krause B, et al. Sampling for landscape elements—A case study from Lower Saxony, Germany[J]. Environmental monitoring and assessment, 2014,186(3):1421-1430.
[24] 全国统计专业技术资格考试用书编写委员会, 统计业务 知识[M]. 北京:中国统计出版社,2011:245-272.
[25] Whittaker E T. On the functions which are represented by the expansions of the interpolation-theory[R]. Edinburgh, Scotland: Edinburgh University, 1915:181-194.
[26] 柳盛,吉根林,李文俊.一种基于连接度的空间线对象聚 类算法[J].计算机科学,2011,38(8):179-181.
[27] Merkle R, Hellman M E. Hiding information and signatures in trapdoor knapsacks[C]. IEEE Transactions on Information Theory,1978,24(5):525-530.
[28] Nievergelt J H H, Sevcik K C. The grid file: An adaptable, symmetric multikey file structure ACM Trans[J]. Database Syst, 1984,9:38-71.
[29] Dai B R, Lin I C. Efficient map/reduce-based DBSCAN algorithm with optimized data partition[C]. 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), 2012:59-66.
[30] Xu X, Jäger J, Kriegel H-P. A fast parallel clustering algorithm for large spatial databases[M]. High Performance Data Mining. Springer, 2002:263-290.