地球信息科学学报 ›› 2015, Vol. 17 ›› Issue (1): 1-7.doi: 10.3724/SP.J.1047.2015.00001

• •    下一篇

基于N-KD树的空间点数据分组算法

魏海涛1,2(), 杜云艳2,*(), 任浩玮1,2, 刘张2, 易嘉伟2, 许开辉1,2   

  1. 1. 山东科技大学 测绘科学与工程学院,青岛 266510
    2. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
  • 收稿日期:2014-02-14 修回日期:2014-07-19 出版日期:2015-01-10 发布日期:2015-01-05
  • 通讯作者: 杜云艳 E-mail:Weiht@lresi.ac.cn;duyy@lreis.ac.cn
  • 作者简介:

    作者简介:魏海涛(1979-),女,博士生,研究方向为海洋数据并行计算。E-mail:Weiht@lresi.ac.cn

  • 基金资助:
    海洋公益性专项“海洋环境信息云计算与云服务体系框架应用研究(201105033);海洋预报综合信息系统(MiFSIS)研究应用(201105017)

A Spatial Point Data Grouping Algorithm Based on the N-KD Tree

WEI Haitao1,2(), DU Yunyan2,*(), REN Haowei1,2, LIU Zhang2, YI Jiawei2, XU Kaihui1,2   

  1. 1. Shandong University of Science and Technology, Qingdao 266510, China
    2. LREIS, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
  • Received:2014-02-14 Revised:2014-07-19 Online:2015-01-10 Published:2015-01-05
  • Contact: DU Yunyan E-mail:Weiht@lresi.ac.cn;duyy@lreis.ac.cn
  • About author:

    *The author: CHEN Nan, E-mail:fjcn99@163.com

摘要:

随着科学技术的进步,地理空间数据的分析处理面临着数据量膨胀和计算量高速增长的双重挑战,为了解决海量数据处理速度慢的问题,本文针对空间分布不均匀的点数据,从数据并行的角度,以保持数据的空间邻近性及保证数据分组后各组数据量负载均衡为目标,提出基于N-KD树(Number-K Dimension Tree)数据动态分组的方法,其是一种面向实时变化(数据量和数据空间范围变化)的空间数据动态分组方法。该方法借鉴K-D树的创建和最临近点搜索的思想,通过方差判断数据分布稀疏程度,利用最临近点搜索方法处理边界点,实现空间范围的不均等切分,保证数据分组后各组数据量基本均衡。试验表明,该方法具有较好的动态分组效果与较高的计算效率;支持各种分布状态的空间点数据的分组;分组后各组数据量负载均衡;分组算法本身有支持并行、支持分布式协同工作模式的特点。

关键词: 点数据, 离散性, N-KD树, K-D树, 动态分组, 数据并行

Abstract:

The large amount of calculations with a continuously high-speed growth required in current development of computational technology presents a great challenge to the research of spatial data processing in geography. To accelerate the processing speed of constantly updated real-time data that distributed unevenly, this study developed a dynamic grouping method based on the Number-K Dimension (N-KD) Tree technique. The method employs a data parallel algorithm to preserve spatial proximity and balance the data loads after grouping the real-time data that vary both in quantity and spatial domain. Based on the KD Tree creation, the algorithm measures the data sparsity according to the mathematic variance, addresses the point data near boundaries according to the nearest searching approach, and achieve an unequal data partition with respect to space while having balanced data loads. Experiments reveal that the method is capable of efficiently grouping the point data that have various spatial distributions and balancing the data amount among these groups. The algorithm also supports parallel computing and distributed collaborative working model, which highlights the practical values in its applications.

Key words: point data, discreteness, N-KD Tree, K-D Tree, dynamic grouping, data parallel