地球信息科学学报 ›› 2015, Vol. 17 ›› Issue (5): 575-582.doi: 10.3724/SP.J.1047.2015.00575

• 地理计算并行化 • 上一篇    下一篇

pGTIOL:GeoTIFF数据并行I/O库

胡树坚, 关庆锋, 龚君芳, 刘洋, 范天恒, 云硕   

  1. 1. 中国地质大学(武汉)国家地理信息系统工程技术研究中心, 武汉430074;
    2. 中国地质大学(武汉)信息工程学院, 武汉430074
  • 收稿日期:2014-12-29 修回日期:2015-02-11 出版日期:2015-05-10 发布日期:2015-05-10
  • 通讯作者: 关庆锋(1977-),男,四川绵阳人,教授,主要从事高性能地理计算和地理计算智能研究。E-mail:guanqf@cug.edu.cn E-mail:guanqf@cug.edu.cn
  • 作者简介:胡树坚(1992-),男,广东肇庆人,硕士生,研究方向为高性能地理计算。E-mail:cughushujian@163.com
  • 基金资助:

    教育部高等学校博士学科点专项科研基金(20130145120013)。

pGTIOL: A Parallel GeoTIFF I/O Library

HU Shujian, GUAN Qingfeng, GONG Junfang, LIU Yang, FAN Tianheng, YUN shuo   

  1. 1. National Engineering Research Center of GeographicInformation System, China University of Geosciences,Wuhan 430074, China;
    2. Faculty of Information Engineering, China University of Geosciences, Wuhan 430074, China
  • Received:2014-12-29 Revised:2015-02-11 Online:2015-05-10 Published:2015-05-10

摘要:

在地理栅格并行计算处理中,数据I/O 已成为制约计算性能的主要瓶颈之一。本文针对该问题,首先分析广泛应用于GIS 栅格数据存储的GeoTIFF 格式,重点研究数据的2 种存储模式(即条带存储与块状存储),并根据这2 种存储方式,分别构建了栅格数据从逻辑结构向物理存储结构的映射模型。然后,针对地理空间并行计算的需要,提出了栅格数据的并行读写框架,并利用MPI 并行I/O 技术的文件视图方法,实现了GeoTIFF 数据并行I/O库(pGTIOL)。结果表明,对比开源栅格空间数据转换库(GDAL)的主从I/O 模式,本文提出的pGTIOL 准确读写数据,具有更高的性能。该库隐藏了底层并行I/O 的细节,提供简单易用的并行读写GeoTIFF 栅格数据的接口,支持多数据类型和多种空间分割,实现了对条带存储与块状存储数据的异步并行读写,从而满足动态负载均衡的需求。

关键词: 动态负载均衡, MPI, 并行I/O, 异步, GeoTIFF

Abstract:

Data I/O has become one of the main bottlenecks for parallel geospatial computing. In this study, we firstly explore the data structure of a widely used GIS raster data format-GeoTIFF, particularly focusing on its storage modes (strip storage and tile storage). The transfer functions which map the logical structure of data to the physical storage structure were constructed for both storage modes.This article also designs a framework for parallel I/O of raster data and implementsa parallel GeoTIFF I/O library (pGTIOL) using the file-view technique of MPI-IO. Experimental results showed that pGTIOL effectively enhances the I/O performance in comparison with the master-worker I/O mode which uses the Geospatial Data Abstraction Library (GDAL). pGTIOL encapsulates the underlying parallel I/O routines, and provides easy-to-use interfaces for the parallel reading and writing of GeoTIFF data. Compared with other parallel raster I/O software packages, pGTIOL supports a wide range of data types, both the strip and tile data storage modes, and various domain decomposition methods. Most importantly, pGTIOL supports asynchronous parallel I/O, which allows multiple processes to read and write sub-domains of data on demand.Hence,it could facilitate dynamic load-balancing in application.

Key words: dynamic load-balancing, MPI, Parallel I/O, asynchronous, GeoTIFF