地球信息科学学报 ›› 2015, Vol. 17 ›› Issue (12): 1474-1482.doi: 10.3724/SP.J.1047.2015.01474

• 地球信息科学理论与方法 • 上一篇    下一篇

适于城市交通时序空间数据预测的层叠泛化模型

刘希亮(), 陆锋   

  1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
  • 收稿日期:2015-10-10 修回日期:2015-10-29 出版日期:2015-12-20 发布日期:2015-12-20
  • 作者简介:

    作者简介:刘希亮(1983-),男,河北衡水人,博士后,研究方向为时空智能计算、轨迹数据挖掘。E-mail: liuxl@lreis.ac.cn

  • 基金资助:
    国家自然科学基金项目(41271408、41401460)

A Stacking-based Model for Urban Traffic Time-series Prediction

LIU Xiliang*(), LU Feng   

  1. State Key Lab of Resources and Environmental Information System, IGSNRR, CAS, Beijing 100101, China
  • Received:2015-10-10 Revised:2015-10-29 Online:2015-12-20 Published:2015-12-20
  • Contact: LIU Xiliang E-mail:liuxl@lreis.ac.cn
  • About author:

    *The author: CHEN Nan, E-mail:fjcn99@163.com

摘要:

受地理过程的动态随机性影响,时序空间数据建模普遍存在先验知识缺乏与模型参数设置问题,导致单一模型难以有效地反映地理系统的整体运行状态。本文提出了一个普适性的集成学习框架,设计了基于异态集成学习的层叠泛化模型,按照组合模型最大化减小原始预测误差原则改进了层叠泛化模型平均输出的混合策略,并基于error-ambiguity decomposition对提出的层叠泛化模型的有效性进行了数学证明。基于北京市交通路网通行状态数据的实验结果表明,层叠泛化模型的均方根误差与平均绝对误差均小于单一模型;平均绝对误差方差均小于基于数理统计的混合模型,验证了层叠泛化模型在时序空间数据预测方面的优越性。

关键词: 时序数据分析, 城市交通, 集成学习, 层叠泛化, 鲁棒性

Abstract:

In general, the prediction of urban traffic time-series data often lacks priori knowledge and encounters lots of problems in parameter settings due to the dynamics of traffic. It’s still hard to get a satisfying result just from one model when facing the complexity of traffic phenomena. In view of the limitations of traditional approaches, in this paper we propose a pervasive, scalable ensemble learning framework for urban traffic time-series prediction from the floating car data based on stacked generalization (also known as stacking). Firstly, we analyzed the optimal linear combination of different models and redesigned the learning strategy in setting the Level-1 modeling of the stacking framework. In order to prove the effectiveness of the proposed stacking ensemble learning method, we implemented a mathematical justification based on the error-ambiguity decomposition technology. Secondly, we integrated six classical approaches into this stacking framework, including linear least squares regression (LLSR), autoregressive moving average (ARMA), historical mean (HM), artificial neural network (ANN), radical basis function neural network (RBF-NN), and support vector machine (SVM). We also conducted experiments with an actual urban traffic time-series dataset obtained from 400 main intersections in Beijing’s road networks. We further compared our results of the proposed model with other four traditional combination models, including equal weights method (EW), optimal weights method (OW), minimum error method (ME) and minimum variance method (MV). According to the variance and bias values of different models, the final results reveal that the proposed stacking ensemble approach behaves more robustly than any other single models. Moreover, the stacking ensemble learning approach shows its superior performance comparing to other traditional model combination strategies. These findings demonstrate the competitive properties of the stacking model in the prediction of urban traffic time-series data. We also present the possible explanations with mathematical analysis and plan our future research directions.

Key words: urban traffic, time-series prediction, ensemble learning, stacking, robustness