地球信息科学学报 ›› 2016, Vol. 18 ›› Issue (12): 1597-1607.doi: 10.3724/SP.J.1047.2016.01597

• 地球信息科学理论与方法 • 上一篇    下一篇

基于符号集合近似的城市轨道交通站点分类研究

张丽英1,2(), 孟斌3,*(), 尹芹4   

  1. 1. 中国矿业大学(北京)地球科学与测绘工程学院,北京 100083
    2. 中国石油大学(北京)地球物理与信息学院,北京 102249
    3. 北京联合大学应用文理学院,北京 100191
    4. 首都师范大学资源环境与旅游学院,北京 100048
  • 收稿日期:2016-08-04 修回日期:2016-10-18 出版日期:2016-12-27 发布日期:2016-12-20
  • 通讯作者: 孟斌 E-mail:lyzhang1980@cup.edu.cn;mengbin@buu.edu.cn
  • 作者简介:

    作者简介:张丽英(1980-),女,河南社旗人,博士生,研究方向为时空数据挖掘。E-mail:lyzhang1980@cup.edu.cn

  • 基金资助:
    北京市哲学社会科学基金项目(14CSA002);国家自然科学基金项目(41171136)

Classification of Urban Rail Transit Stations based on SAX

ZHANG Liying1,2(), MENG Bin3,*(), YIN Qin4   

  1. 1. College of Geoscience and Surveying Engineering, China University of Mining& Technology, Beijing 100083;
    2. College of Geophysics and Information Engineering, China University of Petroleum, Beijing 102249
    3. College of Applied Arts & Sciences of Beijing Union University, Beijing 100191;
    4. College of Resource Environment and Tourism, Capital Normal University, Beijing 100048
  • Received:2016-08-04 Revised:2016-10-18 Online:2016-12-27 Published:2016-12-20
  • Contact: MENG Bin E-mail:lyzhang1980@cup.edu.cn;mengbin@buu.edu.cn

摘要:

轨道站点是城市轨道交通基本线网系统中的关键节点,科学的轨道站点分类,对了解城市功能分区及评价轨道交通基础设施建设情况具有重要意义。轨道交通站点时间序列客观记录了所观测的站点在各个时刻点的重要信息,研究其时间序列聚类,是认识和理解轨道交通站点时间序列形成本质的重要手段,也是挖掘轨道交通站点时间序列中隐含的有较高价值规律知识的重要方法。本文以北京IC卡轨道站点刷卡数据为研究对象,提出了描述轨道站点的4个数据集,即工作日进站数据集(WB)、工作日出站数据集(WA)、休息日进站数据集(RB)和休息日出站数据集(RA);并首次引入时间序列分析方法(符号集合近似(SAX)方法)对4个数据集进行聚类分析,实现了高维数据的有效降维和轨道站点之间的相似性度量。采用层次聚类方法并根据聚类有效性DB指数确定将195个站点分为8类更为合理。通过分析每类站点的日客流特征和空间位置分布情况,为轨道交通站点规划设计和管理服务提供一定的客观参考依据。

关键词: 轨道交通站点, 时间序列, 符号集合近似(SAX), 层次聚类, 时空特征

Abstract:

:Urban rail transit stations are the key nodes of the basic urban rail transit network system. The scientific classification of the rail transit stations is significant to understand the urban functional zoning and evaluate the construction of the rail transit infrastructure. The time series data of urban rail transit stations objectively records the important information of observed stations at all-time points. The time series data contains different patterns, which reflect different sequence genesis. Therefore, studying cluster of the time series data is an important means to recognize and understand the essence of time series data formation. It is also a major method to mine higher value of principle and knowledge that implied in time series data. In this paper, we use smart card data of urban rail transit stations in Beijing, and divide the big data into four data sets: weekdays boarding data set (WB), weekdays alighting data set (WA), weekends (rest day) boarding data set (RB) and weekends alighting data set (RA) to describe characteristics of each station’s daily passenger volume. Symbolic Aggregate approXimation (SAX) is firstly introduced to analyze four data sets, which effectively reduces the dimensionality of high-dimensional data and realizes similarity measure between stations. Finally, it is more reasonable to classify the 195 rail transit stations into 8 types according to the DB index by hierarchical clustering method. They are residential stations, work stations, partial residential-based residential and work mixed stations, dislocation stations, tourist attractions and commercial stations, partial work-based residential and work mixed stations, integrated stations and other stations. The performance of SAX is compared with Euclidean distance similarity measure. The results indicate that SAX outperforms Euclidean distance in terms of accuracy and efficiency. The paper analyzes characteristics of daily passenger boarding and alighting volume on four data sets and spatial distribution of each type. It is found that residence and dislocation stations are mostly located in the far end of the subway, while the types of work stations, tourist attractions and commercial stations, partial work-based residential and work mixed stations, and integrated stations are concentrated in the urban areas. Partial residential-based residential and work mixed stations scatter around the city center. The results can help to interpret the different functional zoning of the city and the characteristics of residents' travel behavior, which provides a basis for understanding the urban spatial pattern and its evolution process, and also provides some objective reference for planning, design and management services of rail transit stations.

Key words: rail transit stations, time series, Symbolic Aggregate approXimation(SAX), Hierarchical clustering, spatio-temporal characteristics