本期要文(可全文下载)

分布式空间拓扑连接查询优化处理算法

展开
  • 首都师范大学三维信息获取与应用教育部重点实验室, 北京 100048
杨典华(1977-),男,博士生,主要研究方向为高性能并行GIS。E-mail:yangdh@lreis.ac.cn

收稿日期: 2013-05-15

  修回日期: 2013-08-24

  网络出版日期: 2013-09-29

基金资助

国家自然科学基金项目(40971232);国家“863”计划项目(2012AA12A403)。

Research on the Distributed Spatial Topological Query Optimization Algorithm

Expand
  • Key Laboratory of 3D Information Acquisition and Application, Ministry of Education, Capital Normal University, Beijing 100048, China

Received date: 2013-05-15

  Revised date: 2013-08-24

  Online published: 2013-09-29

摘要

针对传统分布式数据库查询应用于分布式空间数据库查询带来的传输和处理代价高的问题,本文结合已有分布式跨边界片段连接优化方法,深入研究了分布式空间拓扑连接查询处理,提出跨边界连接优化的空间查询优化算法,丰富了传统的分布式查询的关系代数等价变换规则。同时,针对不同片段连接类型的分布式空间查询全局优化策略,实现了分布式空间查询分解与数据本地化,从而优化分布式查询中的数据传输所付出的高昂代价。最后,提出了结点归并、连接归并树、执行结点、执行计划树等分布式查询优化方法,利用相应归并和优化算法将全局空间查询转化为各个场地局部空间数据库的具体执行计划,消除分布式查询中的冗余计算,优化查询计算策略,从而解决分布式空间查询中的处理代价高的问题。通过分布式空间查询实验表明,本文的算法能够较好地提高分布式空间查询的性能。

本文引用格式

杨典华 . 分布式空间拓扑连接查询优化处理算法[J]. 地球信息科学学报, 2013 , 15(5) : 643 -648,679 . DOI: 10.3724/SP.J.1047.2013.00643

Abstract

Due to complex data structure, complicated spatial relationship and massive data volume, distributed spatial query is a time-consuming processing, which will cause high transmission and processing cost. Query processing method in traditional distributed database cannot satisfy the demands of query in distributed geospatial database. Therefore, new query methods in distributed geospatial database need to be studied. In this paper, the distributed spatial join query processing is deeply studied based on the existing optimizing methods of the conventional query processing in traditional distributed database, and a series of transformation rules of relational algebra expression based on cross-border topological join optimization rules are proposed. The processed query tree is optimized by equivalent transformation after data localization. The global optimized method of distributed spatial join query for different fragments is studied. The global spatial query can be transformed into some local fragments joins effectively. The spatial join query is processed in the local area, avoiding the data transmission of spatial data among data nodes during the processing of query, so that the query performance can be improved. To improve the efficiency of the method, some new concepts were put forward, including query merged tree and execution plan tree, which can optimize the executing path of query plan. For example, by adjusting the executing order, some processes with low cost execute first, and the time-consuming processes execute based on the result set generated by the previous processes so as to reduce the process of time-consuming parts and resolve the problem of high cost of query processing to improve the performance of distributed spatial query. The experiment based on the vector data of China shows our methods can reduce the cost of the spatial join and data transmission among the nodes, and the performance improve 28.5%, which demonstrates that our methods outperform the traditional methods in terms of both algorithm complexity and the running time.

参考文献

[1] 邵佩英.分布式数据库系统及其应用[M].北京:科学出版 社,2003.

[2] Özsu M T, Valduriez P. Principles of distributed database systems[M]. 2nd ed.. New York: Prentice-Hall Inc.,1999.

[3] 周玉科,马廷,周成虎,等. MySQL集群与MPI的并行空间分析系统设计与实验[J].地球信息科学学报,2012,14(4): 448-453.

[4] Ramirez M R, de Souza J M. Distributed processing of spatial join[J/OL]. http://cronos.cos.ufrj.br/publicacoesrelteces58902. pdf, 2001.

[5] Abel D J, Ooi B C, Tan K L, et al. Spatial join strategies in distributed spatial DBMS[C]. //Egenhofer M J, Herring J R (Eds.). Proc. of the 4th Int'l Symp. Advances in Spatial Databases. London: Springer-Verlag,1995:348-367.

[6] Kian-Lee Tan, Beng Chin Ooi. Exploiting spatial indexes for semijoin-based join processing in distributed spatial databases[J]. IEEE Transactions on Knowledge and Data Engineering, November/December, 2000, 12(6): 920-937.

[7] Ramirez M R, de Souza J M. Distributed processing of spatial join[C]. //Proc.of the Anais do III Workshop Brasileiro de GeoInformática-GeoInfo, 2001:1-8.

[8] 朱欣焰,周春辉,呙维,等.分布式空间数据分片与跨边界拓扑连接优化方法[J].软件学报,2011,22(2):269-284.

[9] 黄舟,彭霞,张珂,等.分布式计算环境中的空间查询语言全局解析机制[J].地理与地理信息科学,2006,22(3):18-21.

[10] 陈波,高秀娥,陈来杰.基于等价变换的分布式查询优化方法研究[J].计算机工程与设计,2006,27(3):390-392.

[11] 刘书,李正凡.基于分布式数据库系统的一种查询优化算 法[J]. 北京联合大学学报(自然科学版),2005,19(1): 51-54.

[12] Clematis A, Mineter M, Marciano R. High performance computing with geographical data[J]. Parallel Computing, 2003(29):1275-1279.

[13] Hawick K A, Coddington P D, James H A. Distributed frameworks and parallel algorithms for processing large-scale geographic data[J]. Parallel Computing, 2003, 29(10):1297-1333.

[14] Armstrong M P, Marciano R J. Local interpolation using a distributed parallel supercomputer[J]. International Journal of Geographical Information Systems,1996,10(6): 713-729.

[15] Ren Y C, Shen L, Yang C J, et al. Parallel map rendering system for massive geospatial data in distributed environment[C]. Proceedings of International Conference on GeoInformatics, 2009.

[16] Sorokine A, Daniel J, Liu C. Parallel visualization for GIS applications. http://www.geocomputation.org/2005/Abstracts/Sorokine.pdf, 2005.

[17] Hoel E G, Samet H. Data-parallel polygonization[J]. Parallel Computing, 2003(29):1381-1401.

[18] Lanthier M, Nussbaum D, Sack J-R. Parallel implementation of geometric shortest path algorithms[J]. Parallel Computing, 2003(29):1445-1479.

[19] Armstrong M P, Densham P J. Domain decomposition for parallel processing of spatial problems[C]. Computers, Environment, and Urban Systems, 1992(16):497-513.

[20] Ware A, Kinder D. Parallel implementation of the Delaunay Triangulation within a transputer environment[C]. Proceedings of the European Geographic Information Systems (EGIS'91) Conference (UtretchtL EGIS), 1991, 1199-1209.

文章导航

/