地球信息科学学报 ›› 2020, Vol. 22 ›› Issue (7): 1487-1496.doi: 10.12082/dqxxkx.2020.190255

• 地球信息科学理论与方法 • 上一篇    下一篇

矢量瓦片并行构建与分布式存储模型研究

聂沛1,2(), 陈广胜1,2,*(), 景维鹏1,2   

  1. 1.东北林业大学信息与计算机工程学院,哈尔滨 150040;
    2.黑龙江省林业生态大数据存储与高性能(云)计算工程研究中心,哈尔滨 150040
  • 收稿日期:2019-05-23 修回日期:2019-11-12 出版日期:2020-07-25 发布日期:2020-09-25
  • 作者简介:聂 沛(1994— ),男,湖南衡阳人,博士生,主要研究方向为空间大数据。E-mail:15546012870@163.com
  • 基金资助:
    国家自然科学基金项目(31770768);黑龙江省自然科学基金项目(F2017001);黑龙江省应用技术研究与开发计划重大项目(GA18B301)

Parallel Construction and Distributed Storage for Vector Tile

NIE Pei1,2(), CHEN Guangsheng1,2,*(), JING Weipeng1,2   

  1. 1. College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China;
    2. Heilongjiang Province Engineering Technology Research Center for Forestry Ecological Big Data Storage and High Performance(Cloud) Computing,Harbin 150040, China
  • Received:2019-05-23 Revised:2019-11-12 Online:2020-07-25 Published:2020-09-25
  • Contact: CHEN Guangsheng
  • Supported by:
    National Natural Science Foundation of China(31770768);The Natural Science Foundation of Heilongjiang Province of China(F2017001);Heilongjiang Province Applied Technology Research and Development Program Major Project(GA18B301)

摘要:

矢量瓦片体积小、生成效率高、支持动态交互,较传统栅格瓦片有诸多优势,是下一代互联网地图服务研究的重点。为了解决当前矢量瓦片研究中处理速度慢,扩展性差等问题,本文利用并行计算框架Spark进行矢量瓦片快速构建,通过自定义转换函数,将原始矢量数据GeoJson转换成mvt瓦片集;对于生成的矢量瓦片集,本文基于分布式内存文件系统Alluxio设计一个瓦片存储模型-VectorTileStore,模型以键值对进行数据存储,瓦片元数据占据前八个键值对,单个瓦片占据一个键值对,在数据写入的同时,基于键构建一个哈希索引,用于快速访问,模型兼容海量瓦片的组织存储,具有很强的扩展性。通过实验结果表明,本文提出的矢量瓦片并行构建算法较单机构建算法运行时间平均减少49.6%,分布式存储模型VectorTileStore较传统方案更适合海量矢量瓦片存储,存取时间效率更高。

关键词: 矢量瓦片, web地图服务, 并行处理, Spark, 分布式存储, Alluxio

Abstract:

With the deepening of the information technology, Internet maps containing multi-source geospatial information are widely used in many fields such as forestry, ocean, land, transportation, and military. At the same time, due to the advancement of Earth observation, surveying, and mapping technology, spatial data with high precision and wide coverage has grown rapidly, leading to an era of geospatial big data. Under this background, how to quickly and efficiently construct Internet map services becomes the current research priorities and challenges. Grid tiles has been used to construct Internet maps at the beginning, and played an important role in the fast-growing popularity of Internet maps. However, with the mobilization of maps and the gradual deepening of applications, the disadvantages of large size and low efficiency of applying grid tiles are becoming more and more obvious, which is difficult to meet the needs of applications. Vector tiles have many advantages over traditional grid tiles, such as small in size, high in generation efficiency, and support dynamic interaction, are becoming the focus of next generation Internet map service research. In order to further accelerate the processing speed and enhance the scalability in current vector tile application, this study uses big data technology for vector tile processing. Firstly, we uses the parallel computing framework-Spark, to build the vector tile pyramid model. Specifically, through customizing the Spark conversion function, the steps of tile generation are parallelized, and the original vector data GeoJson is converted into a vector tile set-MapBox Vector Tile (Mvt). Then we designs a tile storage model-VectorTile Store, to store the generated Mvt based on the distributed memory filesystem-Alluxio. The VectorTile Store model stores data with key-value pairs, with the tile metadata occupying the first eight key-value pairs, and each single tile occupying a key-value pair. When the data is being written, a hash index is built based on the key for fast access. This model efficiently stores massive tiles and is highly scalable. The experimental results show that the vector tile parallel construction algorithm and distributed storage model proposed in this paper are more efficient than traditional schemes, and are more suitable for massive vector tile data processing.

Key words: vector tile, web map service, parallel processing, spark, distributed storage, alluxio