Orginal Article

Research Progress and Review of High-Performance GIS

  • ZUO Yao , 1, 2 ,
  • WANG Shaohua , 1, 2, 3, * ,
  • ZHONG Ershun 1, 3 ,
  • CAI Wenwen 1, 2
Expand
  • 1. SuperMap Software Co. Ltd., Beijing 100015, China
  • 2. SuperMap GIS Technology Institute, Beijing 100015, China
  • 3. Institute of Geographic Sciences and Nature Resources Research, CAS, Beijing 100101, China
*Corresponding author:WANG Shaohua, E-mail:

Received date: 2016-08-24

  Request revised date: 2017-01-20

  Online published: 2017-04-20

Copyright

《地球信息科学学报》编辑部 所有

Abstract

The development of Internet technology has given birth to the explosive growth of various information in recent years. The traditional data processing method cannot matched with the rapidly improving performance of computer hardware and more efficient methods are needed to process the numerous data. High performance computing technologies, including Parallel cluster computing technology and distributed network technology, bring hints to the solution of these problems. In practice, there are three major distributed computing systems, namely Hadoop, Spark and Storm. Hadoop improves computational performance by introducing MapReduce distributed computing framework, while Spark make full use of computer memory to store data based on Resilient Distributed Datasets(RDD), which has a more rapid reading and writing functions of data . The Storm does not directly collect data. It realizes the data transmission and processing using network nodes. Nowadays, how to take advantage of the improvement of computational performance brought by the development of new hardware architecture to solve the long existing data intensive, computational intensive and communication intensive problems has become a topical issue in the field of GIS studies. In this paper, reviewing current research progress of high performance GIS, we examine and discuss about the algorithm of high performance GIS, parallel GIS computing, memory computing and core computing and give some prospective on the future development of high performance GIS, which provide a reference for the development of high performance GIS system. In addition, the development of the Internet technology and cloud computing is continuously boosting the popularity of GIS cloud computing and big data technology. In this context, domestic and foreign GIS platform vendors have launched their own cloud GIS platform, such as ArcGIS10.4 developed by ESRI and SuperMap 8C by SuperMap, to give support to cross-platform, parallel computing, 64-bit computing, distributed systems and other technologies.

Cite this article

ZUO Yao , WANG Shaohua , ZHONG Ershun , CAI Wenwen . Research Progress and Review of High-Performance GIS[J]. Journal of Geo-information Science, 2017 , 19(4) : 437 -446 . DOI: 10.3724/SP.J.1047.2017.00437

1 引言

互联网时代来临,使地理信息技术得到前所未有的应用、推广和发展,地理信息计算呈现出计算速度快、运行效率高、应用多样化的发展特征。随着计算机技术的发展,以分布式、并行化为代表的高性能计算技术正逐渐融入到地理信息领域,如何利用高性能计算的新型硬件体系结构带来的计算性能提升,解决现有时空数据密集、计算密集和通讯密集问题成为GIS领域的热点问题[1-4]。为此,基于并行集群计算技术等的高性能计算[5-9],研究新一代高性能GIS系统十分重要。它可以有效地为时空大数据集存储、可视化、空间分析和数据服务带来新的解决方案。
高性能计算是基于一组或几组计算机系统组成的集群,通过网络连接组成超级计算系统以加强数据处理、分析计算性能的一种技术[10-12]。而高性能GIS则利用高性能计算的理论体系、技术架构和数据模型等对GIS已有的性能进行扩充和增强,从而方便、快捷地实现海量空间数据的高性能读写,使GIS系统更高效地为地理空间信息科学领域中的计算、数据、通信密集型的科学问题的解决提供技术支撑[13-17]。其高性能表现在:更庞杂的地理空间数据计算,更复杂、多类型的GIS模型与算法,处理时间更短[18-21]
目前,主流的3大分布式计算系统包括Hadoop,Spark和Storm[22-25]。Hadoop基于MapReduce分布式计算框架,其核心技术在于通过分布式架构实现性能提升。而GIS空间分析常常需要对研究区进行空间划分,进一步细划为地貌特征更加统一的计算单元,利用Hadoop的分布式特性及MapReduce分布式存储,可极大地提高GIS空间分析性 能[26-27]。但由于存储硬件条件的限制,在处理更新快速的GIS模型时,则稍显不足。Spark是另一种重要的分布式计算系统,它基于分布式存储集(RDD)的概念,利用计算机内存来存储数据,因而具有更快速的数据读写功能[28]。相比于Hadoop,Spark的优势在于仅需导入一次数据即可实现多次迭代运算,具有更快的运行效率;缺点在于不适合处理需要长时间保存的数据,如果计算环境发生电力中断故障,即会造成数据丢失。Storm并不直接收集数据,而是通过网络节点实现数据传输、处理。其优势在于处理流式数据时,无需进行数据收集和作业调度,而可以直接进行分析,更适应在线的实时GIS大数据处理[29-30]

2 研究现状

当前GIS发展的重要趋势是服务化、云端一体化,亟待研发高效的高性能GIS关键技术[31-32]。其中,分布式并行技术的应用显著提高了GIS空间分析效率,随着内存计算、高性能算法等先进技术的不断进步,大大加速了高性能GIS技术体系的发 展[33-34]。此外,二三维一体化也是当前GIS发展的一大趋势。随着硬件成本降低,显卡性能提升,众核技术的应用加速了三维GIS的发展[35-36]。因此,本研究将主要从高性能GIS算法、并行GIS计算、内存计算和众核计算4个方面对高性能GIS的发展进行总结和归纳(图1)。
Fig. 1 The architecture diagram of high performance GIS

图1 高性能GIS研究内容

2.1 高性能GIS算法

高性能技术出现不久,就开始应用于地理信息领域[37-39]。作为高性能并行GIS系统中的一个重要的组成部分,高性能GIS算法基于利用向量机和并行计算技术形成的高性能计算系统,对海量地理空间数据进行实时处理的空间算法,使原本难以计算的全球尺度、长时间尺度的地理空间现象分析模拟得以实现。已有许多学者开始了相关技术研究,如Turton等[40]研究了职工上下班交通数据分析,原本在工作站需要运行91 h的双约束地理计算模型,在内存共享模型中执行并行计算后仅需3 min,极大地提高了效率。
随着IT技术的不断进步,高性能GIS算法的研究主要分为2个方面:① 对已存在的高密度计算进行并行化处理,利用高性能GIS技术对全局性的海量时空数据进行地学分析和推演,探索构建新的空间模型等;② 探索新的空间分析方法,并不断赋予新的内涵。具有代表性的空间分析算法有:神经网络模型、遗传算法模型、元胞自动机模型等[41-45]。这2个方面各有侧重,前者侧重于从技术层级提供计算行的优化处理,提升运行效率;后者则通过模型、算法,进一步寻找更加高效、便捷的空间分析方法,通过专业地理信息领域模型达到提高分析效率的目的。
以往,高性能计算常常需要一台高CPU、大容量的计算机完成大量计算,这样势必繁琐。近年来,研究人员开始基于一组或几组计算机组成的高性能计算机集群,计算机之间通过网络进行连接,对海量GIS数据进行并行处理和高性能计算,增加了部署的灵活性。基于MapReduce计算架构的Hadoop有简洁的并行计算模型,可对原本串行算法进行快速改造,以适应并行计算等高性能算法[46-47]。目前,这方面研究主要集中在GIS空间分析算法上。例如,Cary等[48]基于MapReduce实现了部分GIS算法,并构建了R树;Chen等[49]基于Hadoop研究设计实现了高性能地理计算框架。这些研究方法为进一步开展高性能算法研究提供了参考。
在并行空间数据结构组织方面,高效的空间数据划分策略有助于合理的空间数据组织存储,可大大提升空间分析的性能。
(1)栅格数据并行计算
栅格数据具有独特的“块状”数据结构,栅格运算常常基于矩阵运算,这样的运算往往具有很好的局部独立性,有利于高性能GIS并行化处理[50]。基于栅格数据空间特征对其在空间上进行划分,并行空间处理,可显著提高栅格运算效率[51-52]
目前,大多数研究集中于空间范围划分和非均匀划分算法研究[53-55]。综合来看,主要分为静态和动态2种划分策略,基于空间数据位置特征,而对GIS数据在空间上进行划分、切割。但当待处理栅格数据的空间特征比较复杂多种,仅仅通过简单的划分并不能满足精细化处理需求。已有的一些研究方法对这种复杂地形划分进行了研究:程果等[56]通过逐步动态分析算法对复杂栅格数据分组计算,构建了动态推进划分算法;欧阳柳[57]研究了基于空间填充曲线模型的数据划分方法,在考虑了空间数据特征的基础上较好地进行负载均衡。尽管这些研究和方法没有达到十分理想的负载均衡效果,但都较好地为栅格数据多种划分策略提供了参考。
(2)矢量数据并行计算
作为地理空间对象的另一种重要的存储格式,矢量数据结构存在空间数据明显、属性数据隐藏的特征。由于数据结构的特殊性以及地理实体的条带化、拓扑等特性,并非所有的矢量数据都适合进行并行处理[58]。而且,由于矢量数据耦合度高、数据连续,这些都加大了矢量数据并行处理的难度[59]
同样,矢量数据的划分策略包括动态和静态2种。静态划分策略,是基于一定数据属性规则进行一定规律的划分,而动态划分策略则通过调用空闲进程实时分配来实现负载平衡[60]。通过进一步考虑空间位置、邻近性等方面,实现最佳的均衡存储。目前主要还是采用传统静态方法,如田光[60]从并行空间运算的需求出发,放弃了传统的数目均衡,分别研究了基于空间聚集的和基于统计聚类的划分策略。
空间索引方面,早期大多研究采用串行空间索引机制,如四叉树索引[61-62]、格网索引[63]等。随着时空数据的种类和数量不断增加,并行的空间索引机制逐渐发展起来,主要,有基于共享存储模型和消息传递模型的2种并行索引机制。它们均通过采用适用的优化策略来提高数据处理效率,且针对特定的硬件环境的空间数据处理及流处理器模型使 用[64]。例如,Papadias等[65]研究了适用于共享虚拟内存的并行空间连接算法;邓亚丹等[66]基于多核处理器技术实现了数据库的优化查询技术。其中,关键技术有地图匹配技术,讲轨迹数据与地图数据进行关联,从而具有实际的地理意义,目前较为成熟的算法有ST-Matching等。
此外,大数据的一个重要组成部分是位置大数据,包括各种地理数据、车辆轨迹数据、时空多媒体数据等,其处理流程包括数据采集、数据分析、计算等。降维分析通过对大量的道路交通数据进行降维,提取更加有意义的数据。其主要包括空间上和时间上的降维:空间降维指的是提取更加关键的节点和线要素;时间降维指的是对时间的离散化,提取关键时段数据。在运行计算方法时,同样需要利用Hadoop、Spark等高性能计算框架,建议轨迹数据的高效时空索引模型和分布式计算,以及高效的数据存储技术。位置大数据不仅指交通数据分析,还包括人类活动规律、地理国情和智慧城市等,需要打开思路进行分析,提供更有价值的信息。

2.2 并行GIS计算

并行计算是一种运行于高性能并行计算机上的超级计算方式。并行计算中的计算节点通过网络连接,从而实现数据传输及计算效率的并行加 速[67]。而并行GIS技术将并行计算技术应用于海量空间数据的并行存储、查询、检索及处理等,为建立响应速度快、运行效率高的软件系统来提供海量空间地理数据的处理能力。以高性能并行集群计算技术和算法相结合的新一代多核并行高性能已经成为了研究的热点[68-70]
在地理计算模型/框架方面,作为地理现象分析与过程模拟的一个模型运行环境,它使用统一的数据接口、模型标准以及通用工具,从而提高模型运算效率、模型间的互操作性、模拟性能等。目前,较流行的并行计算模型有高性能计算MPI、Map/Reduce、Dryad等。国内目前尚无成熟的地理计算系统或框架,且很少用于地理计算。从硬件架构来分,可以大致分为3种典型的模型:共享存储模型(如OpenMP、Pthread、X3H5)、消息传递模型(如MPI、OpenMPI)和流处理器模型(如CUDA、Brook+、OpenCL)。
(1)共享存储模型
共享存储模型通过分享同一片存储地址进行数据存取,对分散于多个线程中的子任务进行同时计算[32]。典型的编程工具包括OpenMP、TBB和Cilk等。例如,基于OpenMP可实现对并行算法的抽象描述,通过在源码中加入pragma编译器即可实现,适合处理轻量的并行计算任务。
(2)消息传递模型
不同于共享存储模型,在消息传递模型中,并行计算任务被划分到多个相互独立的进程节点中,进程节点通过消息传递的方式实现数据通信[71],如MPI(Message Passing InterFace)、OpenMPI等,其中典型代表的是MPI。消息传递模型适合处理海量数据和计算量比较大的并行计算任务[61]
(3)流处理器模型
流处理器模型是针对现代显卡工作模式提出的一种抽象表达。由于单个处理器的计算能力较低,通过增加处理器的数量并优化计算,可明显提高GPU的处理效率[72]。目前,最流行的是CUDA,由NVIDIA编写,可看成是一个并行编程模型,直接于GPU上进行计算。
综合来看,3种模型各有优缺点:共享存储模型通过共享存储地址实现多任务的并行计算;消息传递模型则通过进程节点间的相互通信实现并行计算;流处理模型则通过增加处理器数量实现计算优化。

2.3 高性能内存计算

近年来,内存计算技术的发展为高性能地理计算问题带来了解决方案,并在大数据分析和数据挖掘领域成为研究热点。高性能内存计算利用各个计算节点上的存储空间形成一体的分布式内存空间,并将访问次数较多的文件缓存至该区域,通过规范的文件接口进行访问,从而降低计算任务的读写开销和延时,加强负载均衡,提高运行效率[5]。与传统数据仓储技术相比,内存计算技术在即时分析方面有更高的灵活性和更强的运算性能[14]。当处理海量数据时,高效的内存计算可以大大地提升系统的数据处理能力和运算效率。此外,64位GIS高性能计算拥有更大的带宽,而且突破了内存容量的限制,带来了更大的性能提升。目前,国内已有的SuperMap系列软件采用64位高性能计算,充分发挥云计算中心高配置服务器计算资源,在多边形拓扑的同一项测试中比32位技术速度快了一倍[73]
高性能内存技术发展至今,综合来看,主要分为3类:分布式缓存计算、计算网络、分布式内存数据库系统。分布式缓存计算通过将访问频率较高的数据存放于内存之中,从而提高访问效率。多数技术基于内存键值存储,支持get和set方法。同时,分布式缓存计算还具有高动态扩展性,通过增减内存节点数获取最佳性能[52,54]。常见的分布式缓存计算系统有Memcached、Redis,此外还有一些新的技术,如ACID事务处理、eviction策略[68,70]。计算网络技术则将数据发送至本地执行,并不适合处理不断增长的海量数据。分布式内存数据库系统的出现大大增强了基于MapReduce的海量数据并行处理的能力,而且利用分布式SQL等工具,可以较好地处理较复杂的数据。

2.4 众核计算

众核计算是指在处理器中集成成百上千个计算引擎内核,它们可以独立运行计算机命令,并基于并行计算执行多任务处理操作,从而使性能大幅提升。随着技术的发展,统一计算设备架构CUDA技术的出现,使GPU计算用于通用计算。由于GPU在带宽和访问频率方面均很高,因此其访问显存效率明显高于CPU访问内存的速率。在处理像元重复访问较多的遥感影像数据时,基于GPU技术可快速实现影像的可视化和分析处理。
在存储器组织方面,GPU采用多级访问策略。在使用CUDA处理海量空间数据时,可以从全局和局部2个方面出发,运算较为复杂时,可不采用共享存储而是在全局存储进行运算分析;而对于简单计算,从局部计算出发则更便捷,将一个块存储数据组织到共享存储中,然后再进行访问。
此外,GPU计算还可用于组织空间索引策略。近年来,国内外学者在利用GPU加速数据索引方面开展了大量研究。Zhang等[68]基于GPU计算对大规模点数据进行并行处理,并提出了CSPT-P 树索引结构。Luo等[74]利用GPU加速实现R树批量加载操作。此外,针对R树接近根结点时检索并行度低等问题,Kim[75]通过SMP算法并行执行搜索子树实现了查询策略优化。

3 评述与讨论

目前,基于并行计算、分布式存储和内存计算等高性能理论与技术形成的高性能GIS系统,对已有GIS系统的性能进行了扩展和加强,实现了海量空间数据的高性能读写,为地理空间现象分析、地理科学应用提供了帮助[64]
高性能算法研究方面,已经实现了在普通计算机系统中研究实现海量数据的并行处理框架系统和高性能计算,降低了高性能计算成本,而且提出了简洁的并行计算模型,大大简化了运行过程。然而,空间计算任务调度应用研究大多集中于矢量数据并行研究。在未来,应当针对地理空间栅格数据的处理算法,在构建计算强度估计方程时,考虑科学的栅格数据的矩阵结构和栅格处理类算法的遍历方程,减轻计算强度消耗时间所带来的负担。
索引机制方面,随着分布式计算系统的发展,应当进一步研发更高效的分布式索引方法。当前主流的分布式算法是在传统的索引算法的基础上改进的并行算法,如SR-树索引生成算法、基于索引节点拷贝的Fat-B树、DPB+Tree等[17-20]。这些算法大多集中于数据的安全性实现,如数据节点之间的数据拷贝防止数据的丢失[21]。在未来,应当更加关注分布式索引算法的开销、冗余控制等,以及对于平台的独立性,将算法的主要功能集中于提高数据索引性能的提高方面。
并行计算方面,并行GIS技术已经广泛应用于海量空间数据的并行存储、查询、检索及处理等方面。服务器集群系统通过网络连接节点而组成分布式系统进行集群计算。较为流行的Hadoop分布式系统,是基于MapReduce分布式计算框架、HDFS分布式文件系统和HBase数据存储系统。而Spark在系统架构设计方面进行了一些改进,通过内存来存储数据可提供更快的运算速度。Storm用于处理高速、大型数据流,它在Hadoop的基础上添加了实时数据处理功能,是一种分布式实时计算系统,可直接通过网络节点实时读写数据。针对具体不同的GIS应用,应当选择合适的分布式计算系统。
内存计算方面,随着技术的发展,主要包括基于单独内存计算和分布式内存计算等,分布式内存代表包括Spark等计算方式。大多数技术基于内存键值存储,具有高动态扩展性,可通过改变内存节点的数量获取最佳性能。计算网络计算方式则在处理不断增长的海量数据方面处于劣势。分布式内存数据库系统在数据处理复杂度不断增加的情况下,仍有较好的表现[10-12,19]
空间数据存储方面,目前空间数据的存储主要包括:传统的关系型数据库、非关系型数据库,以及分布式文件系统。面对新型海量数据,传统的关系型数据库往往计算性能低下,且拓展困难[67],而新一代存储系统(如Ceph、Swift、 MongoDB等)为空间数据的组织、管理提供了新的思路。科研、科技界均作了许多探索,陈崇成,林剑峰等[41]通过引入分布式图数据库和并行图计算框架,基于矢量、栅格数据一体化系统,实现海量空间数据的分布式管理与访问。Google提出了一种采用灵活自由、高可用、结构松散的分布式数据库管理系统,结合GFS和MapReduce实现了海量栅格数据的云存储、管理[5,67]

4 展望

纵观国内外高性能GIS研究现状及其进展不难发现,并行GIS计算、高性能计算模式和分布式存储仍然是GIS技术领域发展的重要方向。面对海量时空GIS大数据,以高性能GIS算法、并行GIS计算、高性能内存计算和众核计算为代表的高性能GIS在解决时空数据密集、计算密集和网络通讯密集等问题方面提供了解决方案,提升了GIS地理分析的效率。
目前,在空间信息科学领域中,并行计算技术和方法的研究主要包括矢量、栅格数据的并行处理、高性能和高可用GIS研究等。研究重点仍集中于影像数据的并行处理上,而针对矢量数据并行存取和处理的研究成果相对较少。在计算模式方面已经涌现了较为新颖的GIScript等[73],可支持Hadoop、Spark和Storm等分布式计算系统。
在数据存储方面,并行空间数据库的引入将突破文件系统的限制,为并行GIS提供功能更强大的数据管理平台。众所周知,空间数据存储是GIS系统的基础,现有的GIS系统大多基于文件型的空间数据存储系统。今后应当进一步思考将并行GIS中关键算法同并行空间数据库的设计有机地融合在一起,面向新的应用领域(网络分布式空间信息服务)和新的计算 框架(CyberGIS)形成较为完备的高性能并行GIS研究体系[31,76-78],从而为解决新的问题提供支持和帮助。
硬件方面,计算机性能计算从单核发展到多核,由单处理器发展到多处理器,并且成本越来越低,提供了强大的计算能力,未来多核CPU计算将是一个重要的发展趋势。此外,GPU技术的发展为高性能计算带来了新的进步[74-75,79],但目前的问题是软件技术并不能很好地适应硬件的发展,未来随着GPU技术的进一步发展,软件开发环境的发展必须加大研发力度,实现软硬件结合的高性能计算。
随着近年来互联网、云计算、移动技术和物联网的迅猛发展,GIS云计算和大数据技术逐渐成为热门。在云计算方面,国内外GIS平台厂商纷纷推出了自有的云GIS平台,如ESRI推出的ArcGIS 10.4版本采用云+端的方式,国内SuperMap开发的SuperMap 8C,支持虚拟化的GIS等。云GIS技术,并非只是将现有的GIS平台移植到云平台而已,还需要具备支持跨平台、并行计算、64位计算、分布式系统等技术[80-82]。此外,数据中心的虚拟化逐渐成为研究热点,具体包括网络、服务器、存储等的虚拟化技术[83]。时空大数据处理技术方面,面对日益增长的时空大数据,传统的数据处理技术已经捉襟见肘,一些新技术的出现带来了新的发展,如分布式缓存、基于MPP的分布式数据库、分布式文件系统、各种NoSQL分布式存储方案等[84-89]
开源GIS技术,由于不用过分考虑数据兼容性、易用性等问题,开发者可集中精力于软件功能研发,因此开源GIS往往拥有强劲的性能和功能,并涌现出大量各平台各类型的开源GIS软件。例如,开源桌面GIS方面,有QGIS、GRASS GIS、SuperMap iDesktop Cross等;开源技术和工具,有GIS Tools for Hadoop、SpatialHadoop、PySAL、GeoWave和GeoSpark等。开源和互操作是高性能GIS重要的发展方向之一,开源GIS必然集开放、标准与互操作于一体,提供高性能GIS软件服务。

The authors have declared that no competing interests exist.

[1]
Brady D.Designing GIS for high availability and high performance[C]. International Conference/exhibition on High PERFORMANCE Computing in the Asia-Pacific Region, 2000. Proceedings, 2000:423-431.

[2]
Aji A, Wang F, Vo H, et al.Hadoop-GIS: A high performance spatial data warehousing system over mapReduce[J]. Proceedings of the Vldb Endowment, 2013,6(11):1009-1020.Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

DOI PMID

[3]
Wang F, Aji A, Vo H.High performance spatial queries for spatial big data: From medical imaging to GIS[J]. Sigspatial Special, 2015,6(3):11-18.Support of high performance queries on large volumes of spatial data has become increasingly important in many application domains, including geospatial problems in numerous disciplines, location based services, and emerging medical imaging applications. There are two major challenges for managing massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. Our goal is to develop a general framework to support high performance spatial queries and analytics for spatial big data on MapReduce and CPU-GPU hybrid platforms. In this paper, we introduce Hadoop-GIS -- a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through skew-aware spatial partitioning, on-demand indexing, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. To accelerate compute-intensive geometric operations, GPU based geometric computation algorithms are integrated into MapReduce pipelines. Our experiments have demonstrated that Hadoop-GIS is highly efficient and scalable, and outperforms parallel spatial DBMS for compute-intensive spatial queries.

DOI

[4]
Yang C W, Huang Q Y, Li Z, et al.Big Data and cloud computing: innovation opportunities and challenges[J]. International Journal of Digital Earth, 2017,10(1):13-53.

[5]
Corbett J C, Dean J, Epstein M, et al.Spanner: Google's globally-distributed database[J]. International Conference on Data Engineering Icde, 2013,31(3):251-264.Spanner is Google scalable, multi-version, globallydistributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner. 1

DOI

[6]
Kindratenko V, Trancoso P.Trends in high-performance computing[J]. Computing in Science & Engineering, 2011,13(3):92-95.

[7]
Clarke K C.A general-purpose parallel raster processing programming library test application using a geographic cellular automata model[J]. International Journal of Geographical Information Science, 2010,24(5):695-722.Abstract A general-purpose parallel raster processing programming library (pRPL) was developed and applied to speed up a commonly used cellular automaton model with known tractability limitations. The library is suitable for use by geographic information scientists with basic programming skills, but who lack knowledge and experience of parallel computing and programming. pRPL is a general-purpose programming library that provides generic support for raster processing, including local-scope, neighborhood-scope, regional-scope, and global-scope algorithms as long as they are parallelizable. The library also supports multilayer algorithms. Besides the standard data domain decomposition methods, pRPL provides a spatially adaptive quad-tree-based decomposition to produce more evenly distributed workloads among processors. Data parallelism and task parallelism are supported, with both static and dynamic load-balancing. By grouping processors, pRPL also supports data鈥搕ask hybrid parallelism, i.e., data parallelism within a processor group and task parallelism among processor groups. pSLEUTH, a parallel version of a well-known cellular automata model for simulating urban land-use change (SLEUTH), was developed to demonstrate full utilization of the advanced features of pRPL. Experiments with real-world data sets were conducted and the performance of pSLEUTH measured. We conclude not only that pRPL greatly reduces the development complexity of implementing a parallel raster-processing algorithm, it also greatly reduces the computing time of computationally intensive raster-processing algorithms, as demonstrated with pSLEUTH.

DOI

[8]
李绍俊,王尔琪. SuperMap高性能海量空间数据管理策略[C].2009中国地理信息产业论坛,2010.

[Li S J, Wang E Q.SuperMap high performance massive spatial data management strategy[C]. China Geographic Information Industry Forum, 2010. ]

[9]
王结臣,王豹,胡玮,等.并行空间分析算法研究进展及评述[J].地理与地理信息科学,2011,27(6):1-5.作为GIS的核心功能之一,空间分析逐步向处理数据海量化及分析过程复杂化方向发展,以往的串行算法渐渐不能满足人们对空间分析在计算效率、性能等方面的需求,并行空间分析算法作为解决目前问题的有效途径受到越来越多的关注.该文在简要介绍空间分析方法和并行计算技术的基础上,着重从矢量算法与栅格算法两方面阐述了目前并行空间分析算法的研究进展,评述了在空间数据自身特殊性的影响下并行空间分析算法的发展方向及存在的问题,探讨了在计算机软硬件技术高速发展的新背景下并行空间分析算法设计面临的机遇与挑战.

[Wang J C, Wang B, Hu W, et al.Review on parallel spatial analysis algorithms[J]. Geography and Geo-Information Science, 2011,27(6):1-5. ]

[10]
Chen L, Agrawal G.Optimizing mapReduce for GPUs with effective shared memory usage[C]// hgpu.org, 2012:199-210.

[11]
Beckmann N, Kriegel H. P, Schneider R and Seeger B. 1990. The R-tree: an efficient and robust access method for points and rectangles[J]. Acm Sigmod Record, 2010,19(2):322-331.

[12]
Rizzo S, Vantini G.GOAL: The Challenge of High-Performance in GIS[C]// Sistemi Evoluti per Basi di Dati. 1995.

[13]
Sansrimahachai W, Chalermwat P.An implementation of high performance web-based GIS on parallel cluster using MPI[C]// International Conference on Parallel and Distributed Processing Techniques and Applications, Pdpta 2005, Las Vegas, Nevada, Usa, June 27-30. DBLP, 2005:284-289.

[14]
Shi X.High performance computing: fundamental research challenges in service oriented GIS[C]// Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems. ACM, 2010:31-34.

[15]
Sorokine A.Implementation of a parallel high-performance visualization technique in GRASS GIS[J]. Computers & Geosciences, 2007,33(5):685-695.ABSTRACT This paper describes an extension for GRASS geographic information systems (GIS) that enables users to perform geographic visualization tasks on tiled high-resolution displays powered by the clusters of commodity personal computers. Parallel visualization systems are becoming more common in scientific computing due to the decreasing hardware costs and availability of the open source software to support such architecture. High-resolution displays allow scientists to visualize very large data sets with minimal loss of details. Such systems have a big promise especially in the field of GIS because users can naturally combine several geographic scales on a single display.This paper discusses architecture, implementation, and operation of pd-GRASS GRASS GIS extension for high-performance parallel visualization on tiled displays. pd-GRASS is specifically well suited for very large geographic data sets, such as light detecting and ranging data or high-resolution nation-wide geographic databases. This paper also briefly touches on computational efficiency, performance, and potential applications for such systems.

DOI

[16]
Stojanovic N, Stojanovic D.High-performance computing in GIS: techniques and applications[J]. International Journal of Reasoning, 2013,5(1):42-49.ABSTRACT In this paper, the application of High-Performance Computing (HPC) techniques in processing of geospatial data in Geographic Information System (GIS) is presented. We evaluate two parallel/distributed architectures and programming models: Message Passing Interface (MPI) over Network of Workstations (NoW) and Compute Unified Device Architecture (CUDA) on Graphics Processing Unit (GPU) in well-known problems in GIS: map matching and slope computations. A distributed application for map-matching computation over large-spatial data sets consisting of moving points and a road network was implemented using MPI and experimentally evaluated. A slope computations based on large-digital elevation model data was performed on GPU using CUDA which enable hundreds of threads to run concurrently employing multiprocessors on a graphics card. Experimental evaluations indicate improvement in performance and shows feasibility of using NoW and multiprocessors on a graphic card for HPC in GIS.

DOI

[17]
Dean J, Ghemawat S.MapReduce: Simplified data processing on large clusters[J]. Communications of the Acm, 2008,51(1):107-113.

[18]
王少华. 超图平台软件创新:SuperMap GIS高性能GIS技术简介[J].地球信息科学学报, 2016,18(5):718-718. ]正北京超图软件股份有限公司研发的高性能GIS技术,融合了内存计算、并行计算、GPU计算和高性能算法(图1),其研制的高性能GIS内核,整合发挥计算机大内存、多CPU、多核和显卡的高性能计算能力,显著提升空间数据处理的性能。SuperMap GIS内核构建了全新的基于内存计算(即64位计算)的GIS数据结构与算法,充分利用64位计算环境的大内存寻址能力和更高的指令处理带宽,

[Wang S H. Hypergraph platform software innovation: Introduction SuperMap GIS high performance GIS Technology[J]. Journal of Geo-Information Science, 2016,18(5):718-718. ]

[19]
Kalantari M.Spatial cloud computing: A practical approach[J]. Spatial Science, 2015,60(1):197-198.

DOI

[20]
Dai C, Yang J.Research on orthorectification of remote sensing images using GPU-CPU cooperative processing[C]// International Symposium on Image and Data Fusion. IEEE, 2011:1-4.

[21]
Ding Y M, Densham P J.Spatial strategies for parallel spatial modelling[J]. International Journal of Geographical Information Science, 1996,10(6):669-698.ABSTRACT To achieve high levels of performance in parallel geoprocessing, the underlying spatial structure and relations of spatial models must be accounted for and exploited during decomposition into parallel processes. Spatial models are classified from two perspectives, the domain of modelling and the scope of operations, and a framework of strategies is developed to guide the decomposition of models with different characteristics into parallel processes. Two models are decomposed using these strategies: hill-shading on digital elevation models and the construction of Delaunay Triangulations. Performance statistics are presented for implementations of these algorithms on a MIMD computer.

DOI

[22]
Eldawy A, Mokbel M F.SpatialHadoop: A mapReduce framework for spatial data[C]// IEEE, International Conference on Data Engineering. IEEE, 2016:1352-1363.

[23]
Eldawy A, Mokbel M F.A demonstration of SpatialHadoop: An efficient mapreduce framework for spatial data[J]. Proceedings of the Vldb Endowment, 2013,6(12):1230-1233.This demo presents SpatialHadoop as the first full-fledged MapRe- duce framework with native support for spatial data. Spatial- Hadoop is a comprehensive extension to Hadoop that pushes spa- tial data inside the core functionality of Hadoop. SpatialHadoop runs existing Hadoop programs as is, yet, it achieves order(s) of magnitude better performance than Hadoop when dealing with spa- tial data. SpatialHadoop employs a simple spatial high level lan- guage, a two-level spatial index structure, basic spatial components built inside the MapReduce layer, and three basic spatial opera- tions: range queries, k-NN queries, and spatial join. Other spa- tial operations can be similarly deployed in SpatialHadoop. We demonstrate a real system prototype of SpatialHadoop running on an Amazon EC2 cluster against two sets of real spatial data ob- tained from Tiger Files and OpenStreetMap with sizes 60GB and 300GB, respectively.

DOI

[24]
Eldawy A, Mokbel M F.HadoopViz: A mapReduce framework for extensible visualization of big spatial data[C]// IEEE, International Conference on Data Engineering. IEEE, 2016:601-612.

[25]
常生鹏,马亿旿,蔡立军,等.一种基于Hadoop的高分辨率遥感图像处理方法[J].计算机工程与应用,2015,51(11):167-171.随着海量、多源的高分辨率遥感数据的获取,耗时较多、效率低下的传统处理方式已经不能满足用户需求。针对上述问题,提出了一种基于云计算的高分遥感数据处理框架,利用Hadoop技术设计和改进了Meanshift图像边缘分割算法,并在Hadoop环境下进行了仿真实验。实验结果表明,在Hadoop环境下的高分辨率卫星图像数据处理速度有了明显的改善。

DOI

[Chang S P, Ma Y W, Cai L J, et al.A kind of high resolution remote sensing image processing method based on Hadoop[J]. Computer Engineering and Applications, 2015,51(11):167-171. ]

[26]
李波. 基于Hadoop的海量图象数据管理[D].上海:华东师范大学,2011.

[Li B.The Management of massive images data based on hadoop[D]. Shanghai: East China Normal University, 2011. ]

[27]
林碧英,王艳萍.基于Hadoop的电力地理信息系统数据管理[J].计算机应用,2014,34(10):2806-2811.针对传统电力地理信息系统(gis)在存储能力、分析能力和扩展能力上的不足,将云计算技术应用到电力gis领域,提出利用hadoop云平台对电力gis数据进行高效存储和管理的方案。首先对电力gis各类数据的特点进行了分析,提出了关系型数据库与非关系型数据库相结合的数据存储策略,并在此基础上设计了基于hadoop的电力gis数据管理整体架构、相应的数据模型以及基于mapreduce的数据并行查询分析方法。最后,在单机和集群的环境下,对空间分析与运行数据查询的性能进行了对比与验证。实验结果表明,在数据量达到一定规模时,该方案优势明显,数据分析与查询的平均时间缩短30%以上,具有较高的效率和良好的扩展性。

DOI

[Lin B Y, Wang Y P.Data management based on Hadoop for power geographic information system[J]. Journal of Computer Applications, 2014,34(10):2806-2811. ]

[28]
Yu J, Wu J, Sarwat M.GeoSpark: A cluster computing framework for processing large-scale spatial data[C]// The Sigspatial International Conference. 2015:1-4.

[29]
You S, Zhang J, Le G.Large-scale spatial join query processing in Cloud[C]// IEEE International Conference on Data Engineering Workshops. IEEE, 2015:34-41.

[30]
刘义,陈荦,景宁,等.利用MapReduce进行批量遥感影像瓦片金字塔构建[J].武汉大学学报·信息科学版,2013,38(3):278-282.在分析面向瓦片金字塔并行构建任务分解的基础上,提出了一种利用MapReduce进行批量遥感影像瓦片金字塔并行构建的方法。实验结果表明,该方法不仅在集群上快速、有效地解决了单机上难以解决的大规模批量遥感影像瓦片金字塔的构建操作,而且具有良好的可扩展性。同时,该算法可作为大规模遥感影像并行处理的基础框架,非常容易扩展到高效能影像特征提取、遥感影像融合以及影像增量计算等其他海量遥感影像处理任务中。

[Liu Y, Chen L, Jing N, et al.Parallel batch-building remote sensing images tile pyramid with mapReduce[J]. Geomatics and Information Science of Wuhan University, 2013,38(3):278-282. ]

[31]
Wang S.A CyberGIS Framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis[J]. Annals of the Association of American Geographers, 2010,100(3):535-557.Cyberinfrastructure (CI) integrates distributed information and communication technologies for coordinated knowledge discovery. The purpose of this article is to develop a CyberGIS framework for the synthesis of CI, geographic information systems (GIS), and spatial analysis (broadly including spatial modeling). This framework focuses on enabling computationally intensive and collaborative geographic problem solving. The article describes new trends in the development and use of CyberGIS while illustrating particular CyberGIS components. Spatial middleware glues CyberGIS components and corresponding services while managing the complexity of generic CI middleware. Spatial middleware, tailored to GIS and spatial analysis, is developed to capture important spatial characteristics of problems through the spatially explicit representation of computing, data, and communication intensity (collectively termed computational intensity), which enables GIS and spatial analysis to locate, allocate, and use CI resources effectively and efficiently. A CyberGIS implementationGISolveis developed to systematically integrate CI capabilities, including high-performance and distributed computing, data management and visualization, and virtual organization support. Currently, GISolve is deployed on the National Science Foundation TeraGrid, a key element of the U.S. and worldwide CI. A case study, motivated by an application in which geographic patterns of the impact of global climate change on large-scale crop yields are examined in the United States, focuses on assessing the computational performance of GISolve on resolving the computational intensity of a widely used spatial interpolation analysis that is conducted in a collaborative fashion. Computational experiments demonstrate that GISolve achieves a high-performance, distributed, and collaborative CyberGIS implementation.

DOI

[32]
王尔琪,王少华. 未来GIS发展的技术趋势展望[J].测绘通报,2015(S2):66-69.基于对最新IT的追踪,提出了GIS技术的6大发展趋势。Linux操作系统、云计算、ARM低功耗计算平台、物联网、移动互联网、大数据技术、开源软件等的突飞猛进,推动智慧城市、地理智慧的发展,使得GIS功能为大众所用,拉开了新一轮GIS技术的变革,改变了GIS软件研发和应用模式。

[Wang E Q, Wang S H.Prospects for the future development of GIS Technology[J]. Bulletin of Surveying and Mapping, 2015,S2:66-69. ]

[33]
Aji A, Sun X, Vo H, et al.Demonstration of Hadoop-GIS:A spatial data warehousing system over MapReduce[C]// ACM Sigspatial International Conference on Advances in Geographic Information Systems. ACM, 2013:528-531.

[34]
赵春宇. 高性能并行GIS中矢量空间数据存取与处理关键技术研究[D].武汉:武汉大学,2006.

[Zhao C Y.Research on the key technologies of vector space data access and processing in high performance parallel GIS[D]. Wuhan: WuHan University, 2006. ]

[35]
Zhang Y, Mueller F.Auto-generation and auto-tuning of 3D stencil codes on GPU clusters[C]// Proceedings of the Tenth International Symposium on Code Generation and Optimization. ACM, 2012:155-164.

[36]
杨坤. 可视域分析算法的并行化与容错研究[D].南京:南京师范大学,2014.

[Yang K.Research on parallel and fault tolerance of the horizon analysis algorithm[D]. Nanjing: Nanjing Normal University, 2014. ]

[37]
Zhou Z B, Wang Q, Liang G U, et al.Using ArcGIS REST construct high-performance WebGIS services[J]. Manufacturing Automation, 2010.Key Words】:

[38]
范协裕,任应超,杨崇俊,等.基于集群技术的可伸缩云GIS服务平台研究[J].计算机应用研究,2012,29(10):3736-3739.为了建立一个高可伸缩、能按需提供服务的云GIS服务平台,研究了云GIS服务平台的特征和需求.针对这些特征,利用可配置的服务器集群技术和分布式缓存技术,设计并实现了一个提供OGC标准服务的云GIS服务平台.通过实验验证了平台在处理高并发、计算密集型任务方面具有高可伸缩性和良好的加速比,并且在实际的GIS应用中验证了平台的可用性.

DOI

[Fan X Y, Ren Y C, Yang C J, et al.Research on scalable cluster-based cloud GIS platform[J]. Application Research of Computers, 2012,29(10):3736-3739. ]

[39]
郭菁,郭薇,胡志勇.大型GIS空间数据库的有效索引结构QR-树[J].武汉大学学报·信息科学版,2003,28(3):306-310.在分析R 树索引问题的基础上 ,提出了一种面向大型GIS空间数据库的QR 树索引新方法

DOI

[Guo J, Guo W, Hu Z Y.QR-tree: An efficient spatial indexing structure for GIS with very large spatial database[J]. Geomatics and Information Science of WuHan University, 2003,28(3):306-310. ]

[40]
Turton I, Openshaw S.High-performance computing and geography: Developments, issues, and case studies[J]. Environment & Planning A, 1998,30(10):1839-1856.In this paper we outline some of the results that were obtained by the application of a Cray T3D parallel supercomputer to human geography problems. We emphasise the fundamental importance of high-performance computing (HPC) as a future relevant paradigm for doing geography. We offer an introduction to recent developments and illustrate how new computational intelligence technologies can start to be used to make use of opportunities created by data riches from geographic information systems, artificial intelligence tools, and HPC in geography.

DOI

[41]
陈崇成,林剑峰,吴小竹,等.基于NoSQL的海量空间数据云存储与服务方法[J].地球信息科学学报,2013,15(2):166-174.近年来,实现海量空间数据高效地存储管理和在线服务,成为地学信息科学领域日益关注的热点问题。本文根据矢量和栅格空间数据的不同特点,提出并实现了矢量栅格数据一体化的海量空间数据分布式云存储管理与访问服务方案,在海量矢量数据存储和处理中创新性引入分布式图数据库Neo4J和并行图计算框架。在三层式空间数据云存储架构基础上,给出NoSQL数据库技术的栅格和矢量数据云存储的实现策略与方法,并开展了通用数据访问接口的设计。采用分布式文件系统HDFS存储栅格数据,并使用列族数据库HBase对其建立分布式空间索引,及采用满足ACID约束的分布式图数据库Neo4J来存储矢量数据,并使用R树建立空间索引。在自主研发的地理知识云平台GeoKSCloud框架下,初步实现了核心组件-空间数据聚合中心(GeoDAC)软件,可为各类用户提供空间数据分布式存储管理和访问服务。通过搭建试验床,开展GeoDAC与开源GIS软件PostGIS在矢量数据读写访问性能方面的对比测试。结果表明,虽然GeoDAC没有获得写入性能的加速作用,但其具有PostGIS无法比拟的强大读取性能。GeoDAC将海量数据经过空间分割后分布在集群上,能够并行处理查询请求,极大地提高空间查询速度,具有广阔的应用前景。

DOI

[Chen C C, Lin J F, Wu X Z, et al.Massive geo-spatial data cloud storage and services based on NoSQL database technique[J]. Journal of Geo-Information Science, 2013,15(2):166-174. ]

[42]
陈星雨. 基于OPENGL和GDAL的卫星遥感图像处理系统的研究[D].广州:华南理工大学,2013.

[Chen X Y.Research of satellite remote sensing image processing system based on OPENGL and GDAL[D]. Guangzhou: South China University of Technology, 2013. ]

[43]
肖伟器,冯玉才,缪勇武. 空间对象数据库的网格索引机制[J].计算机学报,1994(10):736-742.本文提出了针对空间对象的一种新的索引机制,称为网格索引(Ldex).Ldex基于空间对象的位置及其分布,是一种高效实用的空间索引方法.文章全面地讨论了它的查找、插入、删除和修改算法及实现技术.

[Xiao W Q, Feng Y C, Miu Y W.Grid index mechanism of spatial object database[J]. Journal of Computer Science, 1994,10:736-742. ]

[44]
周海芳,赵进.基于GPU的遥感图像配准并行程序设计与存储优化[J].计算机研究与发展,2012,49(S1):281-286.遥感图像配准是遥感图像应用的一个重要处理步骤.随着遥感图像数 据规模与遥感图像配准算法计算复杂度的增大,遥感图像配准面临着处理速度的挑战.最近几年,GPU计算能力得到极大提升,面向通用计算领域得到了快速发 展.结合GPU面向通用计算领域的优势与遥感图像配准面临的处理速度问题,研究了GPU加速处理遥感图像配准的算法.选取计算量大计算精度高的基于互信息 小波分解配准算法进行GPU并行设计,提出了GPU并行设计模型;同时选取GPU程序常用面向存储级的优化策略应用于遥感图像配准GPU程序,并利用 CUDA(compute unified device architecture)编程语言在nVIDIA Tesla M2050 GPU上进行了实验.实验结果表明,提出的并行设计模型与面向存储级的优化策略能够很好地适用于遥感图像配准领域,最大加速比达到了19.9倍.研究表明 GPU通用计算技术在遥感图像处理领域具有广阔的应用前景.

[Zhou H F, Zhao J.Parallel programming design and storage optimization of remote sensing image registration based on GPU[J]. Journal of Computer Research and Development, 2012,49(S1):281-286. ]

[45]
胡树坚,关庆锋,龚君芳,等. pGTIOL:GeoTIFF数据并行I/O库[J].地球信息科学学报,2015,17(5):575-582.在地理栅格并行计算处理中,数据I/O已成为制约计算性能的主要瓶颈之一。本文针对该问题,首先分析广泛应用于GIS栅格数据存储的Geo TIFF格式,重点研究数据的2种存储模式(即条带存储与块状存储),并根据这2种存储方式,分别构建了栅格数据从逻辑结构向物理存储结构的映射模型。然后,针对地理空间并行计算的需要,提出了栅格数据的并行读写框架,并利用MPI并行I/O技术的文件视图方法,实现了Geo TIFF数据并行I/O库(p GTIOL)。结果表明,对比开源栅格空间数据转换库(GDAL)的主从I/O模式,本文提出的p GTIOL准确读写数据,具有更高的性能。该库隐藏了底层并行I/O的细节,提供简单易用的并行读写Geo TIFF栅格数据的接口,支持多数据类型和多种空间分割,实现了对条带存储与块状存储数据的异步并行读写,从而满足动态负载均衡的需求。

DOI

[Hu S J, Guan Q F, Gong J F, et al.pGTIOL: A parallel geoTIFF I/O library[J]. Journal of Geo-Information Science, 2015,17(5):575-582. ]

[46]
Liu Y, Li M, Alham N K, et al.Load balancing in MapReduce environments for data intensive applications[C]// Eighth International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, 2011:2675-2678.

[47]
刘小利,徐攀登,朱国宾,等.结合MapReduce和HBase的遥感图像并行分布式查询[J].地理与地理信息科学,2014,30(5):26-28.提出了一种可扩展的遥感图像多维度并行查询模式,即利用MapReduce实现海量图像数据 金字塔的并行构建,利用HBase实现图像的分布式检索,设计和实现了单张遥感图像金字塔的并行构建方法和图像索引系统.实验结果表明,随着Hadoop 和HBase集群的增长,图像数据的导入和检索速度得到明显提升.

DOI

[Liu X L, Xu P D, Zhu G B, et al.Parallel and distributed retrieval of remote sensing image using HBase and MapReduce[J]. Geography and Geo-Information Science, 2014,30(5):26-28. ]

[48]
Cary A, Sun Z, Hristidis V, et al.Experiences on processing spatial data with MapReduce[C]// Scientific and Statistical Database Management, International Conference, SSDBM 2009, New Orleans, La, Usa, June 2-4, 2009, Proceedings. DBLP, 2009:302-319.

[49]
Chen Q, Wang L, Shang Z.MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS[C]// Fourth IEEE International Conference on Escience. IEEE Computer Society, 2008:646-651.

[50]
周建鑫,陈荦,熊伟,等. 地理栅格数据并行I/O的研究与实现[J].地理信息世界,2013(6):62-65.随着遥感和测绘技术的进步,日益增大的数据量和高效的数据I/O矛盾逐渐凸显。本文针对这个 问题提出了地理栅格数据的一种并行I/O模式,并对该I/O模式进行了相应的编程实现。通过设计实验验证其正确性和效率,我们发现相比传统的栅格数据I /O库(GDAL库)而言,采用本文提出的并行I/O模式对应的I/O库将能同时保证结果的正确性和高效性。

DOI

[Zhou J X, Chen L, Xiong W, et al.Study and Implementation of Parallel I/O for Geospatial Raster Data[J]. Geomatics World, 2013,6:62-65. ]

[51]
刘磊,尹芳,冯敏,等.基于开源Hadoop的栅格数据分布式处理[J].华中科技大学学报(自然科学版),2013,41(7):103-108.为实现大规模栅格数据的高性能分布式处理,在最新开源项目Hadoop基础上,设计与开发了一个基于MapReduce计算模型的栅格数据分布式计算系统,重点探讨该系统的栅格数据HDFS分布式存储、高效Map并行处理及结果Reduce合并等关键技术;基于上述技术实现了对大规模栅格数据分布式计算原型系统,详细介绍系统组成,并进一步将其应用于处理全国范围90m地形数据.结果表明:分布式计算方法通过连接多计算节点,有效地提高了栅格数据计算效率,提高地学栅格数据的分析计算能力,从而解决地学相关模拟计算的性能瓶颈.

[Liu L, Yin F, Feng M, et al.Distributed computation of raster data using open source Hadoop[J]. Journal of Huazhong University of Science and Technology (Nature Science Edition), 2013,41(7):103-108. ]

[52]
Yang C, Yu M, Hu F, et al.Utilizing cloud computing to address big geospatial data challenges[J]. Computers Environment & Urban Systems, 2016.Abstract Big Data has emerged with new opportunities for research, development, innovation, and business. It is characterized by the so-called four Vs: volume, velocity, veracity, and variety, and it may bring significant value through the processing of a large amount of data. The transformation of Big Data's four Vs into the fifth V (value) is a grand challenge for processing capacity. Cloud computing has emerged as a new paradigm to provide computing as a utility service for addressing different processing needs with (a) on-demand services, (b) pooled resources, (c) elasticity, (d) broadband access, and (e) measured services. The utility of delivering computing capability fosters a potential solution for the transformation of Big Data's four Vs into the fifth V. This paper investigates how cloud computing can be utilized to address Big Data challenges to enable such transformation. We introduce and review four geospatial scientific examples, including climate studies, geospatial knowledge mining, land-cover simulation, and dust storm modeling. The method is presented in a tabular framework as a guidance to leverage cloud computing for Big Data solutions. It is illustrated that the framework method supports the life cycle of Big Data processing, including management, access, mining analytics, simulation, and forecasting. This tabular framework can also be referred as a guidance to develop potential solutions for other big geospatial data challenges and initiatives such as smart cities.

DOI

[53]
王玉着,刘修国,张唯.并行化多流向策略的栅格河网提取算法[J].武汉大学学报·信息科学版,2015,40(12):1646-1652.流域栅格河网提取是数字地形分析的一个重要应用.为减少数字高程模型(DEM)预处理而产生的伪河道及平行河道,提出基于并行化多流向策略的栅格河网提取算法.通过水流传输矩阵模拟水量的自然流动过程,可直接应用于原始DEM.从河网空间形态和算法运行效率两方面与串行MFD算法、R&N算法及D8算法进行对比,结果表明,多流向策略得到的河网与实际地形形态更加吻合,使用并行策略后,算法的效率比也较其他算法有明显提升.

DOI

[Wang Y Z, Liu X G, Zhang W.Raster river networks extraction based on parallel multiple flow direction algorithms[J]. Geomatics and Information Science of Wuhan University, 2015,40(12):1646-1652. ]

[54]
Wang S, Armstrong M P.A quadtree approach to domain decomposition for spatial interpolation in grid computing environments[J]. Parallel Computing, 2003,29(10):1481-1504.Spatial interpolation is widely used in geographical information systems to create continuous surfaces from discrete data points. The creation of such surfaces, however, can involve considerable computation, especially when large problems are addressed, because of the need to search for neighbors on which to base interpolation calculations. Computational Grids provide the computing resources to tackle spatial interpolation in a timely way. The objective of this paper is to investigate the use of domain decomposition for a distributed inverse-distance-weighted spatial interpolation algorithm; the algorithm runs using the Globus Toolkit (GT) in a heterogeneous Grid computing environment. The interpolation algorithm is modified for implementation in the Grid by using a quadtree to spatially index and adaptively decompose the interpolation problem to balance processing loads. In addition, the GT allows the distributed algorithm to couple multiple machines, potentially of different architectures, to dynamically schedule the decomposed sub-problems through Globus services and protocols (e.g., resource management, data transfer). Experiments are conducted to test how well this distributed IDW interpolation algorithm scales to heterogeneous grid computing environments using irregularly distributed geographical data sets.

DOI

[55]
Vecchiola C, Pandey S, Buyya R.High-performance cloud computing: A view of scientific applications[C]// International Symposium on Pervasive Systems, Algorithms, and Networks. IEEE Computer Society, 2009:4-16.

[56]
程果,陈荦,吴秋云,等.一种面向复杂地理空间栅格数据处理算法并行化的任务调度方法[J].国防科技大学学报,2012,34(6):61-65.随着并行计算技术的成熟,地理空间栅格数据处理算法的并行化研究成为新的热点.聚焦于处理流程包含多个计算步骤的复杂地理空间栅格数据处理算法,基于空间计算域理论,提出了一个随着算法处理流程而动态变化的任务调度方法.实验证明,该方法在算法流程的每一个计算步都会调整任务分组方案,因此相比于传统任务调度方法,任务调度的负载均衡效果更好,并行算法程序的运行时间更短.

DOI

[Cheng G, Chen L, Wu Q Y, et al, A task scheduling method for parallelization of complicated geospatial raster data processing algorithms[J]. Journal of National University of Defense Technology, 2012,34(6):61-65. ]

[57]
欧阳柳. 地理栅格数据并行访问技术研究与实现[D].长沙:国防科学技术大学,2012.

[Ou Y L.Research and implement on parallel access technology for geographic raster data[D]. National University of Defense Technology, 2012. ]

[58]
Papadopoulos A, Manolopoulos Y.Parallel bulk-loading of spatial data[J]. Parallel Computing, 2003,29(10):1419-1444.ABSTRACT Spatial database systems have been introduced in order to support non-traditional data types and more complex queries. Although bulk-loading techniques for access methods have been studied in the spatial database literature, parallel bulk-loading has not been addressed in a parallel spatial database context. Therefore, we study the problem of parallel bulk-loading, assuming that an R-tree like access method need to be constructed, from a spatial relation that is distributed to a number of processors. Analytical cost models and experimental evaluation based on real-life and synthetic datasets demonstrate that the index construction time can be reduced considerably by exploiting parallelism. I/O costs, CPU time and communication costs are taken into consideration in order to investigate the efficiency of the proposed algorithm.

DOI

[59]
杨伟光,李文.使用MPI的并行I/O实现及性能分析[J].计算机工程与应用,2006,42(17):96-98.论文讨论了并行环境中I/O的基本方法--串行I/O方法和并行 I/O方法,并使用MPI-1及MPI-2对这两种方法进行了实现.分析了不同的实现方法对I/O带宽产生的影响.通过理论分析和实验表明,基于MPI- 2的并行I/O实现方法与其它I/O实现方法相比,可得到更高的I/O带宽,是解决I/O性能问题的有效途径.

DOI

[Yang W G, Li W.Implementation of parallel I/O using MPI and its performance analysis[J]. Computer Engineering and Applications, 2006,42(17):96-98. ]

[60]
田光. 并行计算环境中矢量空间数据的划分策略研究与实现[D].武汉:中国地质大学(武汉), 2011.

[Tian G.Research and implementation of partition strategy of vector space data in parallel computing environment[D]. Wuhan: China University of Geosciences, 2011. ]

[61]
Finkel R A, Bentley J L.Quad trees a data structure for retrieval on composite keys[J]. Acta Informatica, 1974,4(1):1-9.

[62]
Jiang Y H, Lai J, Wang T C.Module placement with pre-placed modules using the B*-tree representation[C]// IEEE International Symposium on Circuits and Systems. IEEE, 2001:347-350.

[63]
Gargantini I.An effective way to represent quadtrees[J]. Communications of the Acm, 1982,25(12):905-910.Abstract A quadtree may be represented without pointers by encoding each black node with a quaternary integer whose digits reflect successive quadrant subdivisions. We refer to the sorted array of black nodes as the “linear quadtree” and show that it introduces a saving of at least 66 percent of the computer storage required by regular quadtrees. Some algorithms using linear quadtrees are presented, namely, (i) encoding a pixel from a 2n × 2>n array (or screen) into its quaternary code; (ii) finding adjacent nodes; (iii) determining the color of a node; (iv) superposing two images. It is shown that algorithms (i)-(iii) can be executed in logarithmic time, while superposition can be carried out in linear time with respect to the total number of black nodes. The paper also shows that the dynamic capability of a quadtree can be effectively simulated.

DOI

[64]
Luo Y, Guo K, Wang D, et al.Hyperspectral remote sensing classification processing parallel computing research based on GPU[C]// International Conference on Computer Science and Electronics Engineering. IEEE, 2012:258-261.

[65]
Papadias D, Mamoulis N, Theodoridis Y.Processing and optimization of multiway spatial joins using R-trees[C]// Acm Pods. 1999:44-55.

[66]
邓亚丹,景宁,熊伟.基于现代通用处理器的数据库优化综述[J].计算机科学,2009,36(8):17-20.随着硬件技术的不断发展,计算 机性能不断加强,数据库的性能也日益提高。但也造成了一些新问题,比如Cache延迟的加剧、Cache访问冲突等。针对这些新问题,按照各种优化技术的 分类,深入分析了近10年来现代处理器用于数据库算法优化的各种研究成果,并展望了未来基于新硬件的数据库优化的发展趋势。

DOI

[Deng Y D, Jing N, Xiong W.State of the art and future challenge on database algorithm optimization based on modern processor[J]. computer science, 2009,36(8):17-20. ]

[67]
Ghemawat S, Gobioff H, Leung S.File and storage systems: The google file system[J]. Acm Symposium on Operating Systems Principles Bolton Landing, 2003,37: 29-43.We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use. Categories and Subject Descriptors D [4]: 3istributed file systems

DOI

[68]
Zhang J, You S.Speeding up large-scale point-in-polygon test based spatial join on GPUs[C]. ACM Sigspatial International Workshop on Analytics for Big Geospatial Data. ACM, 2012:23-32.

[69]
Wang S W, Liu Y.TeraGrid GIScience gateway: Bridging cyber infrastructure and GIScience[J]. International Journal of Geographical Information Science, 2009,23(5):631-656.Cyberinfrastructure (CI) represents the integrated information and communication technologies for distributed information processing and coordinated knowledge discovery, and is promising to revolutionize how science and engineering are conducted in the twenty-first century. The value of bridging CI and GIScience is significant to advance CI and benefit GIScience research and education, particularly in distributed geographic information processing (DGIP). This article presents a holistic framework that bridges CI and GIScience by integrating CI capabilities to empower GIScience research and education and establish generic DGIP services supported by CI. The framework, the TeraGrid GIScience Gateway, is based on a CI science gateway approach developed on the National Science Foundation (NSF) TeraGrid - a key element of US and world CI. This gateway develops a unifying service-oriented framework with respect to its architecture, design, and implementation as well as its integration with the TeraGrid. The functions of the gateway focus on enabling parallel and distributed processing for geographical analysis, managing the complexity of TeraGrid software environment, and establishing a Web-based GIS for the GIScience community to gain shared and collaborative access to TeraGrid-based geospatial processing services. The gateway implementation uses Web 2.0 technologies to create a highly configurable and interactive multiuser environment. Two case studies, Bayesian geostatistical modeling and a spatial statistic [image omitted] for detecting local clustering, are used to demonstrate the gateway functions and user environment. The service transformation for these analyses is applied to create a shared, decentralized, and collaborative geographical analysis environment in which GIScience community users can contribute new analysis services and reuse existing gateway services.

DOI

[70]
吴立新,杨宜舟,秦承志,等.面向新型硬件构架的新一代GIS基础并行算法研究[J].地理与地理信息科学,2013,29(4):1-8.随着减灾应急、流域模拟、智能交通、宏观规划、区域发展等大型地 学问题的不断涌现,地理信息系统(GIS)处理的数据量和计算规模不断扩大,而主流GIS仍以串行计算为基础框架,不能充分利用和发挥当前新型硬件构架 (单机多核、多机多核、集群等)计算机资源的能力,难以满足实际应用的规模与高效需求.该文在分析了基础地理算法研究现状的基础上,按计算数据的关联性将 基础地理算法的计算特征分为本地计算、邻域计算、区域计算和全局计算,按计算过程的资源消耗分为数据密集型、计算密集型和I/O密集型,提出了相应的并行 计算策略,包括串行算法的并行改造、并行算法的性能提升和并行算法的创新设计等.进而研发了面向新型硬件构架的新一代GIS的基础地理并行计算算法库和中 间件,并已集成到国产高性能GIS平台——HiGIS中,将会促进我国GIS研究、技术、系统和应用的跨越式发展.

DOI

[Wu L X, Yang Y Z, Qin C Z, et al.On basic geographic parallel algorithms of new generation GIS for new hardware architectures[J]. Geography and Geo-Information Science, 2013,29(4):1-8. ]

[71]
刘文闳,熊伟,吴烨,等.空间索引并行批量加载算法研究[J].现代电子技术,2011,34(22):90-94.空间索引是提高空间数据库查询性能的关键技术。空间数据具有海量、空间目标不规则、结构和关系复杂等特征,要动态地维护空间索引结构,传统R树的构建方法插入代价非常高。在深入分析空间索引批量加载算法基础上,面向多核处理器的新型硬件架构,基于OpenMP并行编程模型,实现Hilbert R树索引的并行批量加载算法。实验结果表明,相对于串行经典算法,该算法的并行效率接近50%,通过查询实验验证,并行加载算法保持了串行算法生成索引的优良查询性能。

DOI

[Liu W M, Xiong W, Wu Y et al. Research on parallel bulk-loading algorithm for spatial index[J]. Modern Electronics Technique, 2011,34(22):90-94. ]

[72]
江岭. 基于DEM的流域地形分析并行算法关键技术研究[D].南京:南京师范大学,2014.

[Jiang L.Research on key technologies of parallel algorithm for watershed terrain analysis based on DEM[D]. Nanjing: Nanjing Normal University, 2014. ]

[73]
黄骞. 面向时空大数据的开放脚本引擎关键技术研究[J].信息技术与标准化,2015(9):7-11.采用"创新2.0"模式下的开源软件开发方法,重点研发时空大数据处理通用框架,研究开放时空脚本引擎,分布式调度等关键技术,打造一套兼容、通用、国际化、敏捷的开源数据处理工具开源工程,并在Git Hub网站上发布运维。

[Huang J.Research on key technologies of open script engine for time and space big data[J]. Information Technology & Standardization, 2015,9:7-11. ]

[74]
Luo L, Wong M D F, Leong L. Parallel implementation of R-trees on the GPU[C]// Asia and South Pacific Design Automation Conference. IEEE, 2012:353-358.

[75]
Kim J, Kim S G, Nam B.Parallel multi-dimensional range query processing with R-trees on GPU[J]. Journal of Parallel & Distributed Computing, 2013,73(8):1195-1207.The general purpose computing on graphics processing unit (GP-GPU) has emerged as a new cost effective parallel computing paradigm in high performance computing research that enables large amount of data to be processed in parallel. Large scale scientific data intensive applications have been playing an important role in modern high performance computing research. A common access pattern into such scientific data analysis applications is multi-dimensional range query, but not much research has been conducted on multi-dimensional range query on the GPU. Inherently multi-dimensional indexing trees such as R-Trees are not well suited for GPU environment because of its irregular tree traversal. Traversing irregular tree search path makes it hard to maximize the utilization of massively parallel architectures. In this paper, we propose a novel MPTS (Massively Parallel Three-phase Scanning) R-tree traversal algorithm for multi-dimensional range query, that converts recursive access to tree nodes into sequential access. Our extensive experimental study shows that MPTS R-tree traversal algorithm on NVIDIA Tesla M2090 GPU consistently outperforms traditional recursive R-trees search algorithm on Intel Xeon E5506 processors.

DOI

[76]
Wang S W, Armstrong M P.A theoretical approach to the use of cyberinfrastructure in geographical analysis[J]. International Journal of Geographical Information Science, 2009,23(2):169-193.ABSTRACT This paper presents a theoretical approach that has been developed to capture the computational intensity and computing resource requirements of geographical data and analysis methods. These requirements are then transformed into a common framework, a grid-based representation of a spatial computational domain, which supports the efficient use of emerging cyberinfrastructure environments. Two key types of transformational functions (data-centric and operation-centric) are identified and their relationships are explained. The application of the approach is illustrated using two geographical analysis methods: inverse distance weighted interpolation and the spatial statistic. We describe the underpinnings of these two methods, present their conventional sequential algorithms, and then address their latent parallelism based on a spatial computational domain representation. Through the application of this theoretical approach, the development of domain decomposition methods is decoupled from specific high-performance computer architectures and task scheduling implementations, which makes the design of generic parallel processing solutions feasible for geographical analyses.

DOI

[77]
Wright D J, Wang S.The emergence of spatial cyberinfrastructure[J]. Proceedings of the National Academy of Sciences of the United States of America, 2011,108(14):5488.ABSTRACT Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge.

DOI PMID

[78]
Yang C, Raskin R, Goodchild M, et al.Geospatial cyberinfrastructure: Past, present and future[J]. Computers Environment & Urban Systems, 2010,34(4):264-277.A Cyberinfrastructure (CI) is a combination of data resources, network protocols, computing platforms, and computational services that brings people, information, and computational tools together to perform science or other data-rich applications in this information-driven world. Most science domains adopt intrinsic geospatial principles (such as spatial constraints in phenomena evolution) for large amounts of geospatial data processing (such as geospatial analysis, feature relationship calculations, geospatial modeling, geovisualization, and geospatial decision support). Geospatial CI (GCI) refers to CI that utilizes geospatial principles and geospatial information to transform how research, development, and education are conducted within and across science domains (such as the environmental and Earth sciences). GCI is based on recent advancements in geographic information science, information technology, computer networks, sensor networks, Web computing, CI, and e-research/e-science. This paper reviews the research, development, education, and other efforts that have contributed to building GCI in terms of its history, objectives, architecture, supporting technologies, functions, application communities, and future research directions . Similar to how GIS transformed the procedures for geospatial sciences, GCI provides significant improvements to how the sciences that need geospatial information will advance. The evolution of GCI will produce platforms for geospatial science domains and communities to better conduct research and development and to better collect data, access data, analyze data, model and simulate phenomena, visualize data and information, and produce knowledge. To achieve these transformative objectives, collaborative research and federated developments are needed for the following reasons: (1) to address social heterogeneity to identify geospatial problems encountered by relevant sciences and applications, (2) to analyze data for information flows and processing needed to solve the identified problems, (3) to utilize Semantic Web to support building knowledge and semantics into future GCI tools, (4) to develop geospatial middleware to provide functional and intermediate services and support service evolution for stakeholders, (5) to advance citizen-based sciences to reflect the fact that cyberspace is open to the public and citizen participation will be essential, (6) to advance GCI to geospatial cloud computing to implement the transparent and opaque platforms required for addressing fundamental science questions and application problems, and (7) to develop a research and development agenda that addresses these needs with good federation and collaboration across GCI communities, such as government agencies, non-government organizations, industries, academia, and the public.

DOI

[79]
康俊锋,杜震洪,刘仁义,等.基于GPU加速的遥感影像金字塔创建算法及其在土地遥感影像管理中的应用[J].浙江大学学报:理学版,2011,38(6):695-700.为提高单个计算节点创建影像金字塔的速度,本研究首先将GPU并行技术用于加速影像重采样算法.影像重采样算法是影像金字塔创建算法的核心步骤,由于金字塔创建过程中数据量会不断发生变化,而数据量的大小直接影响GPU重采样算法效率.提出了一种基于阈值的金字塔遥感影像创建算法,算法将GPU并行与CPU串行遥感影像重采样算法结合,在创建影像金字塔时,依据阈值动态选择不同的重采样算法,并将本算法应用到土地遥感影像金字塔管理中.实验采用大小为10371×7945的24位遥感影像进行测试,结果表明:①基于GPU的并行重采样算法的速度最快,是基于CPU串行重采样算法的10倍;②采用本文算法创建金字塔速度是ArcGIS9.3创建金字塔速度的3倍以上.

DOI

[Kang J F, Du Z H, Liu R Y, et al.Parallel image resample algorithm based on GPU for land remote sensing data management[J]. Journal of Zhejiang University(Science Edition), 2011,38(6):695-700. ]

[80]
Park S J, Choi K H, Park J, et al.A study on spatial analysis using R-based deep learning[J]. International Journal of Software Engineering & Its Applications, 2016,10(5):87-94.

[81]
蔡蕾. 地理计算并行处理技术及性能评价模型研究[D].长沙:国防科学技术大学,2011.

[Cai L.Study on parallelized geographic computing technology and performance evaluation models[D]. Changsha: National University of Defense Technology, 2011. ]

[82]
霍树民. 基于Hadoop的海量影像数据管理关键技术研究[D]. 长沙:国防科学技术大学,2010.

[Huo S M.Research on key technologies of massive image data management based on Hadoop[D]. Chagnsha: National University of Defense Technol, 2010. ]

[83]
钟耳顺. 地理控制与实况地理学关于GIS发展的思考[J].地球信息科学学报,2013,15(6):783-792.本文在分析信息技术的发展趋势和GIS的发展模式的基础上,提出了地理控制(GeoControl)的概念。地理控制是GIS与众多技术集成和融合的结果,是在地理空间信息支持下,根据地理环境的动态特征对主体或客体施加影响,调节和控制主体或客体的运动与状态变化的过程;是以地理系统为目标,充分考虑地理空间维度要素的一项复杂系统控制技术。地理控制已在无人飞机的飞行控制和自动驾驶汽车技术中成功应用,对于智慧城市、智能机器、智能交通和物联网的发展具有重要意义。地理控制是地理信息技术发展的重要阶段,是地理信息技术社会功能和角色的再一次提升。本文还介绍了莱斯驰所提出的实况地理学(Live Geography)方法,实况地理学是传感器网络与GIS的集成与融合的产物。实况地理学实现了地理环境数据的实时采集、处理、分析与应用,也是地理控制动态数据获取的重要手段。实况地理学与地理控制改变了许多传统地理学应用模式,将给GIS带来广泛的影响。

DOI

[Zhong E S.Geocontrol and live geography:Some thoughts on the direction of GIS[J]. Journal of Geo-Information Science, 2013,15(6):783-792. ]

[84]
殷兵. 基于Hadoop的分布式遥感图像处理研究[D].上海:华东师范大学,2015.

[Yin B.Research on distributed remote sensing image processing based on Hadoop[D]. Shanghai: East China Normal University, 2015. ]

[85]
尹芳,冯敏,诸云强,等.基于开源Hadoop的矢量空间数据分布式处理研究[J].计算机工程与应用,2013,49(16):25-29.为实现大规模矢量数据的高性能处理,在开源项目Hadoop基础上,设计与开发了一个基于MapReduce的矢量数据分布式计算系统。根据矢量空间数据的特点,通过分析Key/Value数据模型及GeoJSON地理数据编码格式,构建了可存储于Hadoop hdfs的矢量数据Key/Value文本文件格式;探讨矢量数据的MapReduce计算过程,对Map数据分片、并行处理过程及Reduce结果合并等关键步骤进行了详细阐述;基于上述技术,建立了矢量数据分布式计算原型系统,详细介绍系统组成,并将其应用于处理关中地区1∶10万土地利用矢量空间数据,取得较好效果。

[Yin F, Feng M, Chu Y Q, et al.Research on vector spatial data distributed computing using Hadoop projects[J]. Computer Engineering and Applications, 2013,49(16):25-29. ]

[86]
张传明,潘懋.基于格网索引的GIS矢量数据拓扑重建研究[J].地理与地理信息科学,2006,22(4):20-24.GIS中对原始矢量数据进行拓扑分析和重建是对其进行存储和使用 的前提.引入包括规则格网和四叉树格网在内的索引结构,将全局的矢量拓扑分析转化为单个格网范围内足够少的矢量线段求交过程,减少了运算的复杂度;并用一 种重组算法实现将原始矢量数据转化为符合"逢交必断"标准的矢量数据.试验表明,该算法适合海量和高散乱度的矢量数据.

DOI

[Zhang C M, Pan M.A study on topological reconstruction of GIS vector data based on grid index[J]. Geography and Geo-Information Science, 2006,22(4):20-24. ]

[87]
张明波,陆锋,申排伟,等. R树家族的演变和发展[J].计算机学报,2005,28(3):289-300.近年来,针对空间数据库索引的研究引起了人们越来越多的兴趣和关注.为了快速、有效地处理存储于空间数据库中的海量空间数据,专家学者提出了大量的基于磁盘的空间索引方法.其中,1984年由Guttman提出的R树是目前最流行的动态空间索引结构,广泛应用于原型研究和商业应用中.其后,人们在此基础上针对不同空间运算提出了不同改进.经过20年的发展,不断产生的R树变体逐渐形成了一个枝繁叶茂的空间索引R树家族.该文回顾了R树及其各种主要变体;描述了基于R树的各种批量操作、空间查询处理算法、查询代价模型及查询优化过程;介绍了基于R树的并行处理、并发控制与锁定策略等方面的进展;并且分析了R树的未来研究方向.

DOI

[Zhang M B, Lu F, Shen P W, et al.The evolvement and progress of R-Tree family[J]. Chinese Journal of Computers, 2005,28(3):289-300. ]

[88]
张凯,秦勃,刘其成.基于GPU-Hadoop的并行计算框架研究与实现[J].计算机应用研究, 2014,31(8):2548-2550. ]针对原生的Hadoop云平台处理海洋环境信息可视化效率不高的问题,提出了一种GPU嵌入Hadoop云平台的并行计算框架.该框架以原生Hadoop为基础,GPU并行计算与MapReduce相结合,实现了高效的海洋流场可视化和特征可视化.实验结果表明,提出的并行计算框架在处理数据密集型和计算密集型的海洋数据的效率上优于原生的Hadoop云平台,可达到6~8倍的加速比.因此,提出的云平台框架可以有效提高海洋信息可视化的计算效率,对我国海洋事业的信息可视化发展具有重要的推动作用.

DOI

[Zhang K, Qin B, Liu Q C. Study of parallel computing framework based on GPU-Hadoop[J]. Application Research of Computers. 2014,31(8):2548-2550. ]

[89]
赵园春,李成名,赵春宇.基于R树的分布式并行空间索引机制研究[J].地理与地理信息科学,2007,23(6):38-41.为提高分布式并行计算环境下海量空间数据管理与并行化处理的效率,基于并行空间索引机制的研究,设计一种多层并行R树空间索引结构.该索引结构以高效率的并行空间数据划分策略为基础,以经典的并行计算方法论为依据,使其结构设计在保证能够获得较好的负载平衡性能的前提下,更适合于海量空间数据的并行化处理.以空间范围查询并行处理的系统响应时间为性能评估指标,通过实验证明并行空间索引结构具有设计合理、性能 高效的特点.

DOI

[Zhao Y C, Li C M, Zhao C Y.Research on the distributed parallel spatial indexing schema based on R-Tree[J]. Geography and Geo-Information Science, 2007,23(6):38-41. ]

Outlines

/