地球信息科学学报 ›› 2018, Vol. 20 ›› Issue (5): 647-655.doi: 10.12082/dqxxkx.2018.170374

• 地球信息科学理论与方法 • 上一篇    下一篇

基于MapReduce的海量公交乘客OD并行推算方法

邬群勇(), 苏克云, 邹智杰   

  1. 1. 福州大学地理空间信息技术国家地方联合工程研究中心,福州 350002;2. 空间数据挖掘与信息共享教育部重点实验室,福州 350002
  • 收稿日期:2017-08-10 修回日期:2018-03-07 出版日期:2018-05-29 发布日期:2018-05-20
  • 作者简介:

    作者简介:郑海林(1987-),男,博士生,讲师,主要从事海事信息处理研究。E-mail: hlzhzjou@126.com

  • 基金资助:
    国家自然科学基金项目(41471333);中央引导地方科技发展专项项目(2017L3012)

A MapReduce-based Method for Parallel Calculation of Bus Passenger Origin and Destination from Massive Transit Data

WU Qunyong*(), SU Keyun, ZOU Zhijie   

  1. 1. National &Local Joint Engineering Research Center of Geo-spatial Information Technology, Fuzhou University, Fuzhou 350002, China;2. Key Laboratory of Spatial Data Mining & Information Sharing of MOE, Fuzhou 350002, China;
  • Received:2017-08-10 Revised:2018-03-07 Online:2018-05-29 Published:2018-05-20
  • Contact: WU Qunyong E-mail:qywu@fzu.edu.cn
  • Supported by:
    National Natural Science Foundation of China, No.41471333;The Central Guided Local Development of Science and Technology Project, No.2017L3012.

摘要:

公交乘客出行OD能够反映居民出行特征和出行需求,是进行公交系统评价、调度和线路优化的重要基础数据,对城市规划具有重要的实用价值。现有公交OD推算方法多适用于少量公交数据,无法直接快速地推算海量公交乘客出行OD,因此本文提出了一种基于MapReduce的海量公交乘客OD并行推算方法。首先将公交数据从关系型数据库迁移至HBase数据库;接着利用MapReduce并行计算框架,根据HBase中IC卡数据的Region数量分成多个map任务,每个map任务中Map函数计算上车站点,Reduce函数将上车站点以用户为单位进行归并输出到HDFS;然后在上车记录数据的基础上,根据HDFS存储的块数量分成多个map任务,针对每个乘客的出行记录,综合考虑出行链方法和历史相似出行行为规律实现对公交乘客下车站点较为精确的推算。最后以厦门2015年6月13日至26日的IC卡数据和公交车辆GPS数据进行实例分析,共计算出295条公交线路,16 879 661条上车记录,14 410 058条完整OD记录,占IC卡数据的78.9%,计算效率相比传统方法有较大幅度提升。结果表明:该方法不仅可以较为准确地推算公交乘客上下车站点,而且计算效率较高。

关键词: 海量公交数据, 公交OD, MapReduce, 公交出行链, 出行规律

Abstract:

Bus passengers' origin and destinations (OD) can truly reflect travel characteristics and demands of residents, which is an important basic data for bus system evaluation, scheduling and route optimization, with significantly practical value in urban planning. Existing OD estimation methods are mostly applied to a small amount of bus data, which cannot directly and rapidly calculate mass transit passenger OD. In order to solve these problems, a parallel method for calculation of massive transit passengers' origin and destinations based on MapReduce is investigated. Firstly, database migration tool was applied to transfer massive bus data stored in relational database to HBase. Secondly, MapReduce parallel computing framework was introduced to divide the IC card data into multiple Map tasks in the light of region numbers in HBase to calculate origins. The origins are grouped and stored into HDFS by user in the Reduce function. Thirdly, the destinations are estimated by origins in parallel which are divided into multiple Map tasks according to block numbers stored in HDFS. According to the travel record of each passenger, destinations can be accurately calculated by the means of public transit chain method and history similarity. In the end, taking IC card data and GPS bus data in Xiamen from June 13 to 26, 2015 as the example, which has 295 bus lines, 16 879 661 bus records, and 14 410 058 complete OD pairs which accounted for 78.9% of IC card data. Comparing with the traditional method, the computational efficiency has substantially improved. The results illustrate that the parallel method can not only calculate bus passenger OD accurately, but also has higher computational efficiency.

Key words: massive transit data, public transit origin and destination, MapReduce, public transit trip chain, travel rule