A Stacking-based Model for Urban Traffic Time-series Prediction

LIU Xiliang; LU Feng

doi:10.3724/SP.J.1047.2015.01474

Journal of Geo-information Science >

2015 , Vol. 17 >Issue 12: 1474 - 1482

DOI: https://doi.org/10.3724/SP.J.1047.2015.01474

Orginal Article

A Stacking-based Model for Urban Traffic Time-series Prediction

LIU Xiliang ^,^* ,
LU Feng

Expand

State Key Lab of Resources and Environmental Information System, IGSNRR, CAS, Beijing 100101, China

*Corresponding author: LIU Xiliang, E-mail: liuxl@lreis.ac.cn

Received date: 2015-10-10

Request revised date: 2015-10-29

Online published: 2015-12-20

Copyright

《地球信息科学学报》编辑部所有

Fold

Abstract

In general, the prediction of urban traffic time-series data often lacks priori knowledge and encounters lots of problems in parameter settings due to the dynamics of traffic. It’s still hard to get a satisfying result just from one model when facing the complexity of traffic phenomena. In view of the limitations of traditional approaches, in this paper we propose a pervasive, scalable ensemble learning framework for urban traffic time-series prediction from the floating car data based on stacked generalization (also known as stacking). Firstly, we analyzed the optimal linear combination of different models and redesigned the learning strategy in setting the Level-1 modeling of the stacking framework. In order to prove the effectiveness of the proposed stacking ensemble learning method, we implemented a mathematical justification based on the error-ambiguity decomposition technology. Secondly, we integrated six classical approaches into this stacking framework, including linear least squares regression (LLSR), autoregressive moving average (ARMA), historical mean (HM), artificial neural network (ANN), radical basis function neural network (RBF-NN), and support vector machine (SVM). We also conducted experiments with an actual urban traffic time-series dataset obtained from 400 main intersections in Beijing’s road networks. We further compared our results of the proposed model with other four traditional combination models, including equal weights method (EW), optimal weights method (OW), minimum error method (ME) and minimum variance method (MV). According to the variance and bias values of different models, the final results reveal that the proposed stacking ensemble approach behaves more robustly than any other single models. Moreover, the stacking ensemble learning approach shows its superior performance comparing to other traditional model combination strategies. These findings demonstrate the competitive properties of the stacking model in the prediction of urban traffic time-series data. We also present the possible explanations with mathematical analysis and plan our future research directions.

Key words： urban traffic; time-series prediction; ensemble learning; stacking; robustness

Cite this article

LIU Xiliang , LU Feng . A Stacking-based Model for Urban Traffic Time-series Prediction[J]. Journal of Geo-information Science, 2015 , 17(12) : 1474 -1482 . DOI: 10.3724/SP.J.1047.2015.01474

1 引言

当前,随着ICT（Information Communication Technology,ICT）技术的发展,越来越多的传感器被布置与应用到城市交通状态的数据搜集过程中,如浮动车、手机信令、红外探测仪、公交车辆及可穿戴设备,极大地丰富了智能交通系统（Intelligent Transport System,ITS）与城市GIS研究的数据来源。通过各种传感器所获得的城市交通数据均可以转化为带有时间标签的交通时序数据,用以表征城市路网的运行状态参数,如交通流量、路段行程时间、交叉口通行耗时等。在对城市交通时序数据进行预测的过程中,如何顾及交通现象特有的动态随机性,解决当前建模过程中普遍存在的先验知识缺乏与模型参数设置问题,建立能针对不同交通状态进行自适应学习的通用性模型,成为当前智能交通研究、城市计算与时空智能挖掘领域内的当务之急。

传统意义上城市交通时序数据预测一般采用单一模型进行建模^[1-6]。单一模型仅是对城市交通网络运行状态的片面反映,难以反映整个城市交通系统的动态随机性。同时,这种建模方式不可避免地掺杂着噪音数据对最后结果的影响,并且特定模型的参数设定问题（如ANN,STARMA等）与过拟合现象也值得考虑,因此限制了模型的推广。由于交通现象整体过程的复杂性,很难采用单一模型对原始问题进行全面的概括与反映,可能导致预测的失败。一些学者尝试采用基于数理统计的混合建模对城市交通时序数据进行预测^[7-10]。这种基于数理统计的混合建模方式或是对原始多种模型的简单平均^[8,10],或是采用加权平均的方式对多种模型的预测结果进行线性组合^[9],模型的组合需全部数据参与计算,一旦完成计算各个模型的权重即固定,这对于实际大型、复杂城市交通时序数据分析,难以做到在线应用。同时,在混合建模过程中,由于空间数据之间存在一定的时空相关性,导致独立同分布、过程随机假设在空间知识发现中往往不成立,因此,基于传统数理统计的多模型融合策略通常适用于短时期内稳定、不会有突变的数据,模型的推广性受到一定的影响。

为了克服上述城市交通时序数据预测建模过程中的不足,本文提出一个普适性的集成学习框架,设计了基于异态集成学习的层叠泛化模型（Stacked Generalization, Stacking）,按照组合模型最大化减小原始预测误差原则,重新设计了迭代方法,改进了原始层叠泛化算法平均输出的混合策略,并对模型的有效性进行了数学证明,在此基础上利用北京市路网主要交叉口转向通行耗时数据对模型的有效性进行了实验分析,实验结果表明本文提出的层叠泛化模型,相对于单一模型与基于数理统计的混合模型,在城市交通时序数据的预测效果上均有明显的提升。

2 层叠泛化模型

当前,集成学习按照基本分类器之间的种类关系可划分为异态集成学习和同态集成学习2种^[11],其分类体系和代表算法如图1所示。

View original graphic|Download|PPT slide

Fig. 1 The ensemble learning hierarchy

图1 集成学习分类

现有同态集成学习的提升能力十分有限^[12]。相比于同态集成学习,由于异态集成学习可包容更多不同类别的模型,能更好地对原始问题空间进行覆盖,因此,异态集成学习算法的效果一般优于同态集成学习算法^[13-14]。

交通系统是一个复杂的巨系统,城市交通系统的运行受多方面因素的影响,很难采用单一模型对其进行建模。层叠泛化模型提供了一条新的建模思路,由于集成学习理论有效性的保障,层叠泛化不易产生过拟合^[11]。基于集成学习的“弱学习”理论^[15],当前虽然很难获取一个理想精度很高的城市交通时间序列预测模型,但是从当前的研究文献来看,能达到一般效果的模型却有很多适用于不同的交通状态预测,这为层叠泛化模型的设计提供了良好的前提条件。

2.1 层叠泛化模型学习框架

层叠泛化模型一般采用2层框架式结构,具体描述如下：

给定Rⁿ空间上的数据集

D = {(x i, y i), i = 1,2, ..., m}

作为Level-0层的输入,其中,

x i

表示第i个n维输入向量,

y i

表示相应的特征标签。对原始的D数据进行K次重采样处理,每次的重采样过程将原始数据集分成训练集(

D k

)及对应的测试集(

D - k = D - D k

)2部分（

k = 1,2, ..., K

）。在Level-0阶段建立N个不同的学习模型

L = L 1, ..., L N

并分别采用

D k

和

D - k

进行训练和验证。在每一次的重采样过程中得到对应于

D - k

的测试结果

T k = {(x j k, y j k), j = 1,2, .., | D - k}

k = 1,2, ..., K

。这样每次重采样的测试结果

T k

与原始的对应标签集

D - k

组成新的训练实例

M k = D - k + T k

,在K次重采样结束后得到适用于下一层（Level-1层）的数据集

M = ⋃ M k

k = 1,2, ..., K

。此后,在Level-1阶段设计相应的Level-1层的学习规则继续对

M

进行学习,并将Level-1层的测试结果作为最后的输出。整个层叠泛化模型的逻辑构成如图2 所示。

View original graphic|Download|PPT slide

Fig. 2 The logic structure of stacking

图2 层叠泛化模型逻辑图

2.2 层叠泛化模型Level-1层学习规则

层叠泛化模型中Level-1元学习模型在于寻找最佳的模型组合方式,其训练算法可从上述学习模型中选择,也可采用新的设计思路。为了达到更好的组合效果,将Level-1层的元学习模型设计如下：

L = L 1, ..., L N

为Level-0层中的学习模型,

M = ⋃ M k

为适用于Level-1层的输入数据集。首先从N个学习器中选取平均绝对误差（MAE）最小的模型作为Level-1层元学习模型的第一个集成器（式（1））。

L 1 c = argminE (L)

（1）

然后,按组合后的模型能最大化减小原始预测误差的原则,对下一个模型进行选择（式（2）-（3））。

L 2 c = p 2 L 1 c + (1 - p 2) argminE (p 1 L 1 c + (1 - p 1) l)

（2）

L N c = p N L N - 1 c + (1 - p N) argminE (p N - 1 L N - 1 c + (1 - p N - 1) l)

（3）

式中,

l ∈ L

p i ∈ [0,1]

i = 1, ..., N

。每个系数

p i

的计算按照式（4）的规则进行^[16]。

p i = E i + 1 - E i 2 Δ + 0.5

（4）

式中,

E i + 1

和

E i

分别表示

L i + 1 c

与

L i c

的归一化误差;

Δ

表示前后2个学习模型之间的预测方差（式（5））。

Δ = E (L i + 1 c - L i c) 2)

（5）

2.3 基于error-ambiguity decomposition层叠泛化模型有效性证明

Error-ambiguity decomposition技术是机器学习与人工智能领域较成熟的分析算法性能差异的工具^[11],在诸多领域得到了广泛应用。本文采用error-ambiguity decomposition技术,对提出的层叠泛化模型的有效性进行分析。

对于

d

维上的一个映射

R d → R

,寻找层叠泛化模型中的一个映射关系

f : y = f (x)

。其中,训练数据为

(x i, y i)} | i = 1,2, ..., n

,令

α

作为层叠泛化模型的一个基础输入模型,在给定输入向量

x

时,

α

的学习器为

V α (x)

。令

V ̅ (x)

为给定输入向量

x

条件下层叠泛化模型的输出（式（6））：

（6）

式中,w为各个基础学习器的权重向量,

。式（6）说明,层叠泛化模型的输出结果是对原始基础学习模型的加权平均。定义α学习器对于原始问题的解释度（Ambiguity）为式（7）。

A α (x) = (V α (x) - V ̅ (x)) 2

（7）

对应层叠泛化模型对于原始问题的解释度为式（8）。

（8）

对于给定的输入向量

x

,定义层叠泛化的单一学习器对原始问题的误差（Error）平方为式（9）。

E α (x) = (f (x) - V α (x)) 2

（9）

层叠泛化模型整体对原始问题的误差平方为式（10）。

E (x) = (f (x) - V ̅ (x)) 2

（10）

对

A ̅ (x)

进行如下分解（式（11））：

（11）

同样,对

View original graphic|Download|PPT slide

进行如下分解（式（12））：

（12）

由式（11）-（12）,可得式（13）。

A ̅ (x) = w α E α (x) - E (x)

（13）

令

E ̅ (x) = w α E α (x)

,式（13）即转化为式（14）。

E (x) = E ̅ (x) - A ̅ (x)

（14）

将上述结论推广到整个训练集

S i = {(x i, y i)} | i = 1,2, ..., N

,训练数据集上对应分布

P

,有式（15）：

Ε α = ∫ P (x) E α (x) dx = ∑ i = 1 N P (x i) E α (x i) Α α = ∫ P (x) A α (x) dx = ∑ i = 1 N P (x i) A α (x i) Ε = ∫ P (x) E (x) dx = ∑ i = 1 N P (x i) E (x i)

（15）

式中,

Ε α

为单个学习器的泛化误差（Error）。

Α α

表示层叠泛化模型各基础学习器之间的差异性,反映了对于整个模型的解释度（Ambiguity）。

Ε

代表层叠泛化模型的整体泛化误差。根据式（15）,

Ε α

、

Α α

与

Ε

之间的关系可简写为式（16）。

Ε = Ε α - Α α

（16）

式（16）证明了层叠泛化模型的有效性,即层叠泛化模型的整体泛化误差

Ε

要小于或者等于单个学习器的泛化误差

Ε α

。同时,从式（16）还可看出,增大学习器之间的差异性

Α α

将减小层叠泛化模型的整体泛化误差

Ε

,这从另一方面证明了对于底层基础学习器之间差异性的要求,即学习器之间的差异性应该越大越好,因此,基于异态集成学习的层叠泛化模型对于城市交通时序数据的预测在理论上更有效。

3 实验分析与比较

3.1 研究区域与实验数据

为测试层叠泛化模型对于城市交通时序数据的预测效果,本文选取北京市城市路网交叉口通行耗时数据进行分析。路网数据以北京市五环以内的路网范围作为研究区域,包括18 857个路网节点和26 621条路段,其中各种交叉口总数为14 614个。为了不失一般性,本文选取其中400个主要交叉口（指路网中高速路、主干路以及次干路之间形成的交叉口）对原始路网进行概括（图3）。

View original graphic|Download|PPT slide

Fig. 3 Illustration of the study area

图3 研究区域示意图

北京市城市路网交叉口通行耗时数据来源于文献[17],是一种典型的城市交通时序数据。交叉口通行耗时数据的时间跨度为2011年3月1日到2011年6月30日,针对某一特定交叉口,按照不同的周天（周一到周日）、不同的转向类型（左转、右转、直行）及不同的时段编号进行存储。当前的时间窗口大小为15 min,一天的时段数目共有96个。表1给出了部分数据,表中ID表示交叉口编号,范围1-400;FID代表上游路段编号;TID代表下游路段编号;TTP表示转向类型,“1”表示左转,“2”表示右转,“3”表示直行;TIID表示当前时段编号,范围1-96;TD表示具体交叉口通行耗时数值（s）。

Tab. 1 Turn delay dataset in Beijing (part of the original data)

表1 北京市城市路网交叉口通行耗时数据（部分）

ID	FID	TID	WID	TTP	TIID	TD
174	704	705	1	2	95	43.59
174	704	705	1	2	96	50.65
174	704	706	2	3	1	66.41
174	704	706	2	3	2	54.28
174	704	706	2	3	3	58.83
174	704	706	2	3	4	62.10
174	704	706	2	3	5	72.60
174	704	706	2	3	6	56.20
174	704	706	2	3	7	59.45
174	704	706	2	3	8	65.54
…	…	…	…	…	…	…

3.2 对比模型选择与参数设置

3.2.1 单一模型

本文选取6种城市交通时序数据预测研究中常用的模型,作为层叠泛化学习模型的基础模型：历史均值法MEAN、线性最小回归法LLSR、自回归移动平均法ARMA、人工神经网络ANN、径向基神经网络RBF-NN和支持向量机SVM。前3种属于参数模型,能从整体上对问题进行概括;后3种属于非参数系列,主要反映原始数据的局部细节。本文对单一模型的参数设置如下：

首先,基于北京市路网400个主要交叉口转向通行耗时数据,提取特定交叉口、特定周天、特定转向、特定时间段内4个月内的交叉口转向延误信息,将前3个交叉口转向通行耗时记录作为一个输入端,将第4个交叉口转向通行耗时记录作为对应的输出端,以此类推分别构建整体的训练数据集和测试数据集。将整体的训练数据集和测试数据集按7:3的比例进行随机划分,其中,70%的数据用作单一模型的训练数据,剩下的30%作为单一模型性能的测试数据。此处生成的训练数据即用作层叠泛化模型Level-0层的输入数据。

其次,为了简化模型的参数设置流程,降低建模的难度,本文对需要参数设置的4种模型（自回归移动平均法ARMA、人工神经网络ANN、径向基神经网络RBF-NN和支持向量机SVM）进行必要的参数设置,并按照训练过程中满足90%以上交叉口通行耗时记录并且预测误差在20 s以内的判定标准作为参数设置的条件,对于历史均值法和线性最小回归法,则直接采用输入数据对输出值进行线性推测。

对于自回归移动平均法ARMA(p,q),需对其中的参数(p,q)进行判定,本文以贝叶斯信息准则（Bayes Information Criterion,BIC）作为评判标准,采用单一模型的训练数据对自回归移动平均法ARMA(p,q)进行训练,给定p,q的搜索范围均为[1, 10],通过格网搜索法,最后判定p=1,q=1条件下即可满足判定标准。

对于ANN模型,常用的参数设置包括ANN网络拓扑结构选择、学习速率设置等^[18]。ANN网络拓扑结构设置是建立ANN必不可少的步骤。本文选取具有单隐层的多层感知机（Multilayer Perceptron,MLP）作为人工神经网络模型,采用格网搜索策略确定隐层神经元个数,隐层神经元的搜索范围设定为[1, 10]。各层初始权重、学习速率均随机给定。按照误差反传（Back Propagation,BP）规则,采用单一模型的训练数据训练ANN模型,最后确定满足判定标准的网络拓扑结构为3-5-1,如图4所示。

View original graphic|Download|PPT slide

Fig. 4 The topology of ANN network

图4 ANN网络拓扑结构图

对于径向基神经网络RBF-NN,一般需设定其网络的传播速率。本文为了简化设置对径向基神经网络RBF-NN的传播速率,进行随机赋值。同样,对于支持向量机的SVM惩罚系数与核函数参数也按照随机赋值处理,这样大大降低了模型参数设置的难度。

3.2.2 数理统计混合模型

为进一步验证层叠泛化模型效果,本文采用4种常见的数理统计混合模型对原始的交叉口通行耗时数据进行建模,分别是均权法（Equal Weights method,EW）^[19]、最佳权重法（Optimal Weights Method,OW）^[20]、最小误差法（Minimum Error Method,ME）^[21]和最小方差法（Minimum Variance Method,MV）^[22]。数理统计混合模型的训练数据与测试数据与单一模型保持一致。

以

Y = {Y i} | i = 1,2, .., M

代表原始

M

个基础模型的输出结果;以

w = {w i}

代表各模型在数理统计混权模型下各自的权重,均权法EW的权重可表示为式（17）。

w i = 1 / M, i = 1, …, M

（17）

最佳权重法OW的权重计算公式为式（18）。

w = M v - 1 I m (I m' M v - 1 I m) - 1

（18）

式中,

I

代表单位矩阵（Identity Matrix）。

最小误差法ME的权重计算公式为式（19）：

min f = ∑ i = 1 M w i | y ⌢ i - y i | ∑ i = 1 M w i (y ⌢ i - y i) = ∑ i = 1 M (y ⌢ i - y i) ∑ i = 1 M w i = 1 0 ≤ w i

（19）

对于式（19）,可通过线性规划的方式求解。

最小方差法MV的权重计算公式如式（20）。

min f = w i M v w i T ∑ i = 1 M w i = 1 0 ≤ w i

（20）

3.3 层叠泛化模型整体流程

本文实验平台基于Matlab 2011a,其中MEAN、LLSR模型采用手工编写。ARMA、ANN、RBF-NN采用Matlab自带工具包,SVM采用开源的libSVM工具包^[23]。

在层叠泛化模型训练阶段,实验中Level-0层的数据重采样采用留一正交验证方法（Leave-One-Out Cross Validation,LOO-CV）,这种方法的优点在于每一回合中几乎所有的样本皆用于训练模型,因此最接近原始样本的分布,这样评估所得的结果比较可靠。整个层叠泛化学习模型实验过程的流程如图5所示。

View original graphic|Download|PPT slide

Fig. 5 The flow chart of stacking

图5 层叠泛化模型流程图

3.4 模型效果评判标准

本文以所有模型的最终验证集的均方根误差（Root Mean Square Error,RMSE）和平均绝对误差（Mean Absolute Error,MAE）作为各种模型效果的评判标准。计算公式如式（21）-（22）。

RMSE = ∑ i = 1 N (EstimateValu e i - RealValu e i) 2 N

（21）

MAE = ∑ i = 1 N | EstimateValu e i - RealValu e i | N

（22）

均方根误差反映了模型的泛化能力,而平均绝对误差则体现了模型输出值与实际值之间的偏移。

此外,为了实验结果的稳定性,对整体的训练数据集和测试数据集进行10次随机抽样,每次均按照7:3的比例进行单一模型训练集与测试集的随机划分。最后,以10次随机抽样条件下各个模型均方根误差RMSE和平均绝对误差MAE的均值作为各个模型的最后效果。

3.5 实验结果与分析

3.5.1 单一模型与层叠泛化模型效果对比

采用本文提出的层叠泛化模型与其他6种单一模型对北京市城市路网中的交叉口通行耗时时序数据进行预测分析,其均方根误差结果如图6所示。从图6可看出,径向基神经网络RBF-NN模型对北京市城市路网中的交叉口通行耗时时序数据预测的均方根误差误差最大。线性最小回归法LLSR模型的均方根误差排在第二位,对于交叉口通行耗时时序数据预测的效果也不理想。而本文提出的层叠泛化模型的均方根误差在绝大部分情况下均优于或等于单一模型的最佳结果,印证了上述对于层叠泛化模型理论有效性的证明,另一方面也说明本文提出的层叠泛化模型对于城市交通时序数据的预测相对于单一模型具有很大的优越性。

View original graphic|Download|PPT slide

Fig. 6 The effect comparison between single models and stacking (RMSE)

图6 单一模型与层叠泛化模型效果对比图（RMSE）

3.5.2 数理统计混合模型与层叠泛化模型效果对比

采用本文提出的层叠泛化模型与其他4种数理统计混合模型对北京市城市路网中的交叉口通行耗时时序数据进行预测分析,结果如图7所示。从图7可看出,以均方根误差作为模型效果的评价标准,其中,最小误差法ME对于交叉口通行耗时时序数据预测的效果最差,在多数情况下其均方根误差要远远大于其他4种混合模型。本文提出的层叠泛化模型在绝大多数情况下的均方根误差均小于其他4种对比模型,这说明模型的鲁棒性相对于基于数理统计的混合模型具有一定的优越性。

View original graphic|Download|PPT slide

Fig. 7 The effect comparison between statistic-based models and stacking (RMSE)

图7 数理统计混合模型与层叠泛化模型效果对比图（RMSE）

3.5.3 所有模型效果对比

为更清晰地表示各个模型对北京市城市路网中的交叉口通行耗时时序数据的预测效果,本文将所有模型的均方根误差与平均绝对误差综合（图8）。

View original graphic|Download|PPT slide

Fig. 8 The effect comparison between all models

图8 所有模型效果对比图

从图8可看出,本文提出的层叠泛化学习模型无论在均方根误差评价指标方面,还是平均绝对误差评价指标均比其他6种不同的单一模型效果要好。相对于其他4种模型混合的策略,虽然在平均绝对误差方面最佳权重法同本文提出的层叠泛化模型不相上下,但本文提出的层叠泛化模型的平均绝对误差更小。进一步证明了本文提出的层叠泛化模型在城市交通时序数据预测方面的优越性。

4 讨论

本文分析了城市交通时序数据建模中单一模型与基于数理统计的混合模型各自的优缺点,以当前人工智能与机器学习领域内比较成熟的集成学习理论为指导,提出层叠泛化集成学习模型,按照组合模型最大化减小原始预测误差原则重新设计了迭代融合的计算策略,并基于error-ambiguity decomposition技术,对提出的层叠泛化模型的有效性进行了数学证明。本文以北京市路网主要交叉口转向通行耗时预测,测试提出的层叠泛化模型性能。测试结果（图6-8）证明了本文提出的层叠泛化模型在时序空间数据预测方面的优越性。

此外,除了径向基神经网络RBF-NN模型和线性最小回归法LLSR模型外,所有模型的预测效果在2:00-5:00内均达到最佳（图6-7）。这是因为在此时间段,城市路网中的交通流量最小,因此,城市交叉口的通行耗时在给定的时间窗（15 min）之内各自的变化均很小,使得原始的城市路网交叉口通行耗时数据集在此时间段内基本上没有什么变化。通过本文的实验证明一些文献中的方法（如径向基神经网络RBF-NN、线性最小回归法LLSR、最小误差法ME）并不适用于城市交通时序数据预测。

本研究还存在如下讨论：

（1）在研究对象方面,没有对北京市400个主要交叉口不同的时空分布特性进行考虑,对于当前交叉口的选择仅仅是从路网中高速路、主干路及次干路之间形成的交叉口进行随机选择。虽然本文提出的层叠泛化模型在当前400个主要交叉口通行耗时的实验中获得了较好的效果,但考虑交叉口不同的等级、交叉口通行耗时的时空异质分布特征将会进一步改善当前的模型效果。此外,本模型是否适用于低等级交叉口通行耗时预测,尚需进一步的实验进行验证。

（2）在模型选择方面,虽然本文仅仅采用6种单一模型与4种基于数理统计的混合模型进行实验,但这并不妨碍未来其他模型的加入。本文提出的层叠泛化模型是一个开放性的异态集成学习框架,能自适应调整各模型在不同交通状态下的权重,便于未来其他模型的融入。同时,基于error-ambiguity decomposition对层叠泛化模型有效性的证明,在理论上也保证了层叠泛化模型在城市交通时序数据预测方面的优越性。

（3）在模型推广方面,本文的实验对象选择的是北京市主要交叉口通行耗时数据,但这并不妨碍本文提出的层叠泛化算法对于其他类型时序空间数据预测,如路段行程时间预测、交通流量预测等。这些城市交通时序数据均具有一定的共性（如时空异质分布、交通“潮汐”现象等）,因此均可采用本文提出的层叠泛化模型进行建模,这说明本文提出的层叠泛化模型具有良好的推广泛化性能。

5 结论与展望

本文针对地理过程时序空间数据预测中普遍存在的先验知识缺乏与模型参数设置问题,提出了一个普适性的集成学习框架,设计了基于异态集成学习模型的层叠泛化模型。在层叠泛化模型的Level-1层,按照组合模型最大化减小原始预测误差原则重新设计了迭代融合的计算策略,避免了层叠泛化模型的过拟合现象,并基于error-ambiguity decomposition技术,对提出的层叠泛化模型的有效性进行了数学证明。研究结果表明,层叠泛化学习模型的均方根误差与平均绝对误差均小于单一基础模型。相对于其他模型混合的线性加权模型,虽然在平均绝对误差方面最佳权重法同本文提出的层叠泛化模型不相上下,但层叠泛化模型的平均绝对误差的方差更小,充分证明了层叠泛化模型在地理系统时序数据建模中的有效性。同时,层叠泛化模型不但简化了模型选择与参数设置的过程,而且提供了一个开放的学习框架,易于其他模型的融入,为进一步的研究工作提供了基础。

未来的研究将集中在异态集成学习与同态集成学习的融合,同时不仅从城市交通时序数据入手,在建模过程中还要考虑数据的时空分布特性。

The authors have declared that no competing interests exist.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]

Zhao

, Li

Deriving average delay of traffic flow around intersections from vehicle trajectory data[J]. Frontiers of Earth Science, 2013,7(1):28-33.

Advances of positioning and wireless communication technologies make it possible to collect a large number of trajectory data of moving vehicles in a fast and convenient fashion. The data can be applied to various fields such as traffic study. In this paper, we attempt to derive average delay of traffic flow around intersections and verify the results with changes of time. The intersection zone is delineated first. Positioning points geographically located within this zone are selected, and then outliers are removed. Turn trips are extracted from selected trajectory data. Each trip, physically consisting of time-series positioning points, is identified with entry road segment and turning direction, i.e. target road segment. Turn trips are grouped into different categories according to their time attributes. Then, delay of each trip during a turn is calculated with its recorded speed. Delays of all trips in the same period of time are plotted to observe the change pattern of traffic conditions. Compared to conventional approaches, the proposed method can be applied to those intersections without fixed data collection devices such as loop detectors since a large number of trajectory data can always provide a more complete spatio-temporal picture of a road network. With respect to data availability, taxi trajectory data and an intersection in Shanghai are employed to test the proposed methodology. Results demonstrate its applicability.

DOI

[2]	Rohini B.Predicting speeds on urban streets using real-time GPS data[D]. Arlington TX: University of Texas at Arlington, 2000.

[3]	Ding K L, Wang H T.A New algorithms for dynamic traffic information collection[C]. Proceeding of the Second International Conference on Intelligent Computation Technology and Automation, IEEE Computer Society, 2009:437-439.

[4]	杨兆升,于悦,杨薇.基于固定型检测器和浮动车的路段行程时间获取技术[J]. 吉林大学学报(工学版),2009,39(9):168-171.在深入分析交通流诱导系统信息需求的基础上,提出了一种新的路段行程时间获取技术。首先分别利用固定型检测器和浮动车计算路段平均行程时间,进而应用自适应指数平滑法进行短时预测,最后开发了不同可靠度下基于固定型检测器和浮动车的路段行程时间快速融合算法。试验结果表明,该技术能够准确、高效地获取路段行程时间,为交通流诱导系统提供高质量的输入数据,满足出行者的信息需求。

[5]	吕宏义. 基于支持向量回归机的路段平均速度短时预测方法研究[D].北京:北京交通大学,2007.

[6]

常刚,张毅,姚丹亚.基于时空依赖性的区域路网短时交通流预测模型[J].清华大学学报(自然科学版),2013,53(2):215-22.

由于多数交通流预测模型仅利用了目标路段交通流的历史数据,在一定程度上影响了预测效果。为此,该文提出了一种基于时空依赖性的区域路网短时交通流预测模型。首先,根据区域路网各路段间的拓扑关系,将其抽象为明确表征上下游路段关系的树状结构,进而根据上下游通路上交叉口转弯率的多阶分配来量化上下游路段的时空依赖性,并将其用于时空自回归差分移动平均模型(STARIMA)空间权重矩阵的改进,最后利用历史数据对改进后的STA-RIMA模型进行参数标定,并用于短时交通流预测。实验结果表明:经过改进后的STARIMA模型,具有更好的预测效果,为区域路网短时交通流预测提供了一种新的方法。

[7]

Hibon

, Evgeniou

To combine or not to combine: Selecting among forecasts and their combinations[J]. International Journal of Forecasting, 2005,21(1):15-24.

The primary aim of the paper is to place current methodological discussions in macroeconometric modeling contrasting the ‘theory first’ versus the ‘data first’ perspectives in the context of a broader methodological framework with a view to constructively appraise them. In particular, the paper focuses on Colander’s argument in his paper “Economists, Incentives, Judgement, and the European CVAR Approach to Macroeconometrics” contrasting two different perspectives in Europe and the US that are currently dominating empirical macroeconometric modeling and delves deeper into their methodological/philosophical underpinnings. It is argued that the key to establishing a constructive dialogue between them is provided by a better understanding of the role of data in modern statistical inference, and how that relates to the centuries old issue of the realisticness of economic theories.

DOI

[8]	于滨,杨忠振,林剑艺.应用支持向量机预测公交车运行时间[J].系统工程理论与实践,2007(4):160-164.

[9]

Zhang

, Liu Y

Analysis of peak and non-peak traffic forecasts using combined models[J]. Journal of Advanced Transportation, 2011,45(1):21-37.

Accurate and timely traffic forecasting is crucial to effective management of intelligent transportation systems (ITS). To predict travel time index (TTI) data, we select six baseline individual predictors as basic combination components. Applying the one-step-ahead out-of-sample forecasts, the paper proposes several linear combined forecasting techniques. States of traffic situations are classified into peak and non-peak periods. Based on detailed data analyses, some practical guidance and comments are given in what situation a combined model is better than an individual model or other types of combined models. Indicating which model is more appropriate in each state, persuasive comparisons demonstrate that the combined procedures can significantly reduce forecast error rates. It reveals that the approaches are practically promising in the field. To the best of our knowledge, it is the first time to systematically investigate these approaches in peak and non-peak traffic forecasts. The studies can provide a reference for optimal forecasting model selection in each period. Copyright 漏 2010 John Wiley & Sons, Ltd.

DOI

[10]

李颖宏,刘乐敏,王玉全.基于组合预测模型的短时交通流预测[J].交通运输系统工程与信息,2013,13(2):34-41.

在现代智能交通系统中,短时交通流预测是实现先进的交通控制和交通诱导的关键技术之一.为了提高短时交通流预测的准确性,本文提出了一种基于组合预测模型的短时交通流预测方法.一方面,根据当前的交通流数据来动态调整其对未来预测的影响；另一方面,通过对历史交通流数据的时空特性分析,利用数据挖掘领域的相关知识寻求与当前交通流特性最为相似的历史曲线,并以其为基础来获得预测值的匹配值；然后,将二者获得的信息进行融合,采用多种不同的组合方式来实现短时交通流预测.以厦门市莲花路口断面的交通流量为例,通过对仿真图像和数据的分析,得出各种组合方法的预测平均绝对相对误差均小于10％,能够较好地满足交通诱导系统的需求.

DOI

[11]	Zhou Z H.Ensemble Methods: Foundations and Algorithms[M]. Boca Raton, FL: Chapman & Hall/CRC, 2012.

[12]	王清. 集成学习中若干关键问题的研究[D].上海:复旦大学,2011.

[13]	俞扬. 演化计算理论分析与学习算法的研究[D].南京:南京大学,2011.

[14]

Wolpert D

Stacked generalization[J]. Neural Networks, 1992,5:241-259.

This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.

DOI

[15]

Helmbold

, Warmuth

On weak learning[J]. Journal of Computer and System Sciences, 1995,50(3):551-573.

An algorithm is a weak learning algorithm if with some small probability it outputs a hypothesis with error slightly below 50%. This paper presents relationships between weak learning, weak prediction (where the probability of being correct is slightly larger than 50%), and consistency oracles (which decide whether or not a given set of examples is consistent with a concept in the class). Our main result is a simple polynomial prediction algorithm which makes only a single query to a consistency oracle and whose predictions have a polynomial edge over random guessing. We compare this prediction algorithm with several of the standard prediction techniques, deriving an improved worst case bound on Gibbs algorithm in the process. We use our algorithm to show that a concept class is polynomially learnable if and only if there is a polynomial probabilistic consistency oracle for the class. Since strong learning algorithms can be built from weak learning algorithms, our results also characterizes strong learnability.

DOI

[16]	Yu Y, Zhou Z H, Ting K M.Cocktail ensemble for regression[C]. Proceeding of Seventh IEEE International Conference on Data Mining, 2007,721(726):28-31.

[17]

Liu

, Lu

, Zhang

, et al.Intersection delay estimation from floating car data via principal curves: A case study on Beijing’s road network[J]. Frontiers of Earth Science, 2013,7(2):206-216.

It is a pressing task to estimate the real-time travel time on road networks reliably in big cities, even though floating car data has been widely used to reflect the real traffic. Currently floating car data are mainly used to estimate the real-time traffic conditions on road segments, and has done little for turn delay estimation. However, turn delays on road intersections contribute significantly to the overall travel time on road networks in modern cities. In this paper, we present a technical framework to calculate the turn delays on road networks with float car data. First, the original floating car data collected with GPS equipped taxies was cleaned and matched to a street map with a distributed system based on Hadoop and MongoDB. Secondly, the refined trajectory data set was distributed among 96 time intervals (from 0: 00 to 23: 59). All of the intersections where the trajectories passed were connected with the trajectory segments, and constituted an experiment sample, while the intersections on arterial streets were specially selected to form another experiment sample. Thirdly, a principal curve-based algorithm was presented to estimate the turn delays at the given intersections. The algorithm argued is not only statistically fitted the real traffic conditions, but also is insensitive to data sparseness and missing data problems, which currently are almost inevitable with the widely used floating car data collecting technology. We adopted the floating car data collected from March to June in Beijing city in 2011, which contains more than 2.6 million trajectories generated from about 20000 GPS-equipped taxicabs and accounts for about 600 GB in data volume. The result shows the principal curve based algorithm we presented takes precedence over traditional methods, such as mean and median based approaches, and holds a higher estimation accuracy (about 10%-15% higher in RMSE), as well as reflecting the changing trend of traffic congestion. With the estimation result for the travel delay at intersections, we analyzed the spatio-temporal distribution of turn delays in three time scenarios (0: 00-0: 15, 8: 15-8: 30 and 12: 00-12: 15). It indicates that during one's single trip in Beijing, average 60% of the travel time on the road networks is wasted on the intersections, and this situation is even worse in daytime. Although the 400 main intersections take only 2.7% of all the intersections, they occupy about 18% travel time.

DOI

[18]	刘希亮.基于GA-BP 神经网络抛掷爆破效果预测与分析[D].北京:中国矿业大学(北京),2011.

[19]

Jose

, Winkler

Simple robust averages of forecasts: some empirical results[J]. International Journal of Forecasting, 2008,24(1):163-169.

An extensive body of literature has shown that combining forecasts can improve forecast accuracy, and that a simple average of the forecasts (the mean) often does better than more complex combining schemes. The fact that the mean is sensitive to extreme values suggests that deleting such values or reducing their extremity might be worthwhile. We study the performance of two simple robust methods, trimmed and Winsorized means, which are easy to use and understand. For the data sets we consider, they provide forecasts which are slightly more accurate than the mean, and reduce the risk of high errors. Our results suggest that moderate trimming of 10–30% or Winsorizing of 15–45% of the forecasts can provide improved combined forecasts, with more trimming or Winsorizing being indicated when there is more variability among the individual forecasts. There are some differences in the performance of the trimmed and Winsorized means, but overall such differences are not large.</p>

DOI

[20]	Bates J, Granger C.The combination of forecasts[J]. Operations Research Quarterly, 1969,20(4):451-468.

[21]

, Wang

, Lai

A novel nonlinear ensemble forecasting model incorporating GLAR and ANN for foreign exchange rates[J]. Computers & Operations Research, 2005,32(10):2523-2541.

In this study, we propose a novel nonlinear ensemble forecasting model integrating generalized linear auto-regression (GLAR) with artificial neural networks (ANN) in order to obtain accurate prediction results and ameliorate forecasting performances. We compare the new model's performance with the two individual forecasting models—GLAR and ANN—as well as with the hybrid model and the linear combination models. Empirical results obtained reveal that the prediction using the nonlinear ensemble model is generally better than those obtained using the other models presented in this study in terms of the same evaluation measurements. Our findings reveal that the nonlinear ensemble model proposed here can be used as an alternative forecasting tool for exchange rates to achieve greater forecasting accuracy and improve prediction quality further.

DOI

[22]	Lilian M de Menezes, Bunn D W, Taylor J W. Review of guidelines for the use of combined forecasts[J]. European Journal of Operational Research, 2000,120:190-204. DOI

[23]

Chang C

, Lin C

LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011,2(27):1-27.

LIBSVM is a library for support vector machines (SVM). Its goal is to help users toeasily use SVM as a tool. In this document, we present all its implementation details.For the use of LIBSVM, the README file included in the package & the LIBSVM FAQprovide the information.

DOI

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

1 引言

2 层叠泛化模型

Fig. 1 The ensemble learning hierarchy

2.1 层叠泛化模型学习框架

Fig. 2 The logic structure of stacking

2.2 层叠泛化模型Level-1层学习规则

2.3 基于error-ambiguity decomposition层叠泛化模型有效性证明

3 实验分析与比较

3.1 研究区域与实验数据

Fig. 3 Illustration of the study area

Tab. 1 Turn delay dataset in Beijing (part of the original data)

3.2 对比模型选择与参数设置

Fig. 4 The topology of ANN network

3.3 层叠泛化模型整体流程

Fig. 5 The flow chart of stacking

3.4 模型效果评判标准

3.5 实验结果与分析

Fig. 6 The effect comparison between single models and stacking (RMSE)

Fig. 7 The effect comparison between statistic-based models and stacking (RMSE)

Fig. 8 The effect comparison between all models

4 讨论

5 结论与展望

References