地球信息科学学报 ›› 2020, Vol. 22 ›› Issue (9): 1753-1765.doi: 10.12082/dqxxkx.2020.200134

• 地球信息科学理论与方法 •    下一篇

基于多源地理大数据与机器学习的地铁乘客出行目的识别方法

赵鹏军*(), 曹毓书   

  1. 北京大学城市与环境学院城市规划与交通研究中心,北京100871
  • 收稿日期:2019-03-22 修回日期:2020-07-04 出版日期:2020-09-25 发布日期:2020-10-21
  • 通讯作者: 赵鹏军 E-mail:pengjun.zhao@pku.edu.cn
  • 作者简介:赵鹏军(1975—),男,陕西延安人,教授,博士生导师,主要从交通与空间规划研究。E-mail:pengjun.zhao@pku.edu.cn
  • 基金资助:
    国家自然科学基金项目(41925003);英国研究理事会全球挑战基金项目(R48843)

Identifying Metro Trip Purpose using Multi-source Geographic Big Data and Machine Learning Approach

ZHAO Pengjun*(), CAO Yushu   

  1. The Centre for Urban Planning and Transport Studies, College of Urban and Environmental Sciences, Peking University, Beijing 100871, China
  • Received:2019-03-22 Revised:2020-07-04 Online:2020-09-25 Published:2020-10-21
  • Contact: ZHAO Pengjun E-mail:pengjun.zhao@pku.edu.cn
  • Supported by:
    National Natural Science Foundation of China(41925003);Research Councils of United Kingdom Global Challenges Research(R48843)

摘要:

探索地铁乘客出行目的识别方法,有助于突破智能卡数据(Smart Card Data,SCD)在具体应用场景中的局限性,提升SCD在交通出行研究、交通发展规划等领域的应用价值。本文融合多源地理大数据,基于城市交通与土地利用时空间互动理论,以北京市居民地铁出行为例,在交通出行调查数据中提取5565个地铁出行样本及其对应的出行目的和出行特征相关变量。基于兴趣点(Point of Interest,POI)数据得到各样本起止站点的土地利用特征相关变量,形成包含每次地铁出行的出行目的、出行特征、土地利用特征的地铁出行数据集。使用基于随机森林(Random Forest,RF)算法对地铁出行数据集进行训练完成的分类器对SCD记录的每一次地铁出行进行分类,获得该次出行的出行目的及其不同目的地铁出行时空间分布规律。研究结果表明,本识别方法可有效预测地铁乘客的出行目的,其中,“上班”、“回家”2类出行目的的预测准确率均超过90%;纳入土地利用特征相关变量可显著提升RF分类器预测准确率,印证了城市交通与土地利用的时空间互动理论。鉴于当前SCD的可获取性逐渐提高,该项技术在居民地铁出行监测与预测、地铁线网布局和地铁周边土地利用规划等实践方面,具有很强的推广性,有助于更全面地认知大城市居民的地铁出行行为。

关键词: 地铁出行, 出行目的识别, 交通调查数据, 智能卡数据, 兴趣点数据, 随机森林, 土地利用, 时空间互动, 北京

Abstract:

Identifying metro trip purpose using Smart Card Data (SCD) is important to expand the application of SCD in transport research and transport planning. This paper integrates different types of big data and combines the theories on the interaction between transport and land use. By taking Beijing as a case, we firstly analyze the metro trip purposes of individual passengers using travel survey data from 5565 respondents. Secondly, we investigate the land use features of trip origin and destination using Point of Interest(POI) data . Thirdly, a metro trip dataset is developed which includes the information of trip purpose, trip duration, and spatial distribution of trip origin and destination. Fourthly, a Random Forest (RF) algorithm is used to establish a RF classifier using the metro trip dataset as training data. Finally, this trained classifier is used to classify each metro trip recorded by the SCD to identify the metro trip purpose and the spatial distribution of metro trips for different purposes. The results of analysis show that the random forest classifier trained in this study can effectively identify metro trip purposes from SCD. For trips with "go to work" and "go home" purposes, the accuracy of identification can reach over 90%. One reason for the high identification accuracy is that land use information is included in the RF classifier. Our results confirm the theory of spatial-temporal interactions between transport and land use. There is an increasing availability of multi-source geographic big data and traffic survey data of residents in large cities, which means that the method developed in this study would have a high value in metro trip predicting and monitoring, transport planning, and land use policy-making around the metro stations. Also, our results enhance our knowledge of metro travel behavior in megacities.

Key words: Metro trips, trip purpose, travel survey data, smart card data, point of interest data, Random Forest algorithm, land use, spatial-temporal interactions, Beijing