移动对象多种运动参数在轨迹分类的应用

  • 朱进 , 1 ,
  • 江南 , 2, 3*, * ,
  • 胡斌 2, 3
展开
  • 1. 苏州科技学院环境科学与工程学院,苏州 215009
  • 2. 南京师范大学 虚拟地理环境教育部重点实验室,南京 210023
  • 3. 江苏省地理信息资源开发与利用协同创新中心,南京 210023
*通讯作者:江 南(1957-),男,教授,研究方向为地理信息系统与虚拟地理环境。E-mail:

作者简介:朱 进(1983-),男,江苏南京人,讲师,研究方向为轨迹数据挖掘。E-mail:

收稿日期: 2015-04-13

  要求修回日期: 2015-10-10

  网络出版日期: 2016-02-04

基金资助

环保公益性行业科研专项(201309037)

The Application of Multiple Movement Parameters in Trajectory Classification for Moving Objects

  • ZHU Jin , 1 ,
  • JIANG Nan , 2, 3, * ,
  • HU Bin 2, 3
Expand
  • 1. School of Environmental Science and Engineering, Suzhou University of Science and Technology, Suzhou 215009,China
  • 2. Key Laboratory for Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
  • 3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
*Corresponding author: JIANG Nan, E-mail:

Received date: 2015-04-13

  Request revised date: 2015-10-10

  Online published: 2016-02-04

Copyright

《地球信息科学学报》编辑部 所有

摘要

轨迹分类是以训练轨迹的特征来预测未知轨迹的类标签,可进行可疑车辆识别、非法渔船检测和交通模式检测等重要应用。当前大多数轨迹分类方法只考虑速度和加速度这2个运动参数,且只利用简单的统计量(如均值、中值、最大值),不能充分挖掘轨迹的潜在特点,导致分类精度不高。针对该问题,本文在总结相关文献的基础上,提出一种基于移动对象运动特征的轨迹分类方法。针对速度、加速度、曲率、方向和转角这5个运动参数,利用偏度系数、峰度系数、变异系数和时间序列分析其中的自相关系数等,提取有区分力的全局运动特征;并从分割后的子轨迹中提取局部运动特征。对于方向和转角,引入方向统计学对其运动特征进行精确计算。实验表明本方法在船舶、野生动物和飓风数据集的分类精度达到了100%、80%和71.43%,实验验证了本方法构建的运动特征,在不同数据集下有效可行。

本文引用格式

朱进 , 江南 , 胡斌 . 移动对象多种运动参数在轨迹分类的应用[J]. 地球信息科学学报, 2016 , 18(2) : 143 -150 . DOI: 10.3724/SP.J.1047.2016.00143

Abstract

The purpose of trajectory classification is to predict the class labels of unknown trajectories in terms of the trajectory characteristics. Trajectory classification has many real-world applications, for examples: suspicious vehicles identification, illegal fishing vessels detection, transportation mode detection, etc. Currently, most trajectory classification methods only take two movement parameters which are speed and acceleration into account, and only employ simple statistics such as the mean, median and maximum values, thus they can't fully explore the characteristics of trajectories, which leads to relatively low classification accuracy. In order to solve this problem, based on a thorough literature review on movement parameters and quantitative statistics, this paper proposes a trajectory classification method based on the movement characteristics of moving objects. For movement parameters of velocity, acceleration, sinuosity, direction and turning angle, this method employs statistics such as skewness, kurtosis, coefficient of variation and autocorrelation from time series analysis to construct discriminative global features. In addition, this method extracts local features from sub-trajectories after trajectory segmentation. For direction and turning angle, this method incorporates directional statistics to compute their features accurately. The experimental results of this method based on three real trajectory datasets including vessel, wild animal and hurricane datasets, indicate that the classification accuracies of this method are 100%, 80% and 71.43% respectively. The experiments verify the movement features constructed in this paper are discriminative and effective.

1 引言

分类是数据挖掘、机器学习和模式识别中的核心问题之一[1]。轨迹分类根据训练轨迹的特征来预测未知轨迹的类标签,轨迹分类有很多重要应用,例如可疑车辆识别[2],渔业控制(检测非法渔船),海上安全(恐怖主义、走私、贩卖盗版)[3]和交通模式(步行、驾车、摩托车、自行车等)检测[4-11]等。
轨迹分类方法可分为2类,第1类分类方法利用数据挖掘和机器学习方法,对轨迹建模或构建复杂的特征并分类。文献[12]先对子轨迹用高斯混合模型(Gaussian Mixture Models,GMM)建模,再对整条轨迹用隐马尔可夫模型(Hidden Markov Model,HMM)进行分类。文献[13]对子轨迹构建基于区域和轨迹聚类的特征,基于区域的特征是几乎只包含一类轨迹的区域;轨迹聚类特征是几乎只包含一类轨迹的聚簇。该方法的局限性在于它仅考虑了轨迹的空间信息,而忽略了轨迹的运动特征,考虑下面2种情况:首先,如图1所示,如果移动对象的空间活动范围几乎重叠,几乎不可能构建基于区域的特征或轨迹聚类特征;其次,如图2所示,如果一条轨迹不属于一个区域(蓝色矩形为2级飓风的区域特征)或不接近任何轨迹簇,则该轨迹为异常轨迹,如图中橙色飓风,不具有任何特征。文献[14]把持续时间信息引入轨迹分类并生成了2类特征:基于持续时间的区域和有速度差别的路径,但特征提取方法过于复杂。文献[15]研究了道路网络上的轨迹分类,并使用频繁序列模式作为判别特征进行分类,该方法需要道路网络来构建特征。第2类轨迹分类方法对交通轨迹分类,应用于交通模式检测。Zheng等[7]提取了轨迹长度、移动对象速度和加速度等特征,并利用机器学习方法(决策树、支持向量机、贝叶斯网络和条件随机场)进行分类。在随后的工作中,Zheng等[6]采用了3个新特征:方向变化率(heading change rate),停止率(stop rate)和速度变化率(velocity change rate),以提高分类精度。文献[16]采用模糊逻辑方法,并以速度和加速度作为分类特征。文献[10]同样以速度和加速度作为特征,并利用方差分析(analysis of variance, ANOVA)来选择最佳的判别特征。Sun和Ban对卡车和小汽车分类,结果表明,加速度和减速度的变化是最有区分力的特征[11]。文献[8]利用GIS信息(开放街道地图数据,Open Street Map)来辅助GPS轨迹分类,该方法用速度信息以及距交通网络(公共汽车线路、地铁线、道路路网)的邻近指数来作为特征。文献[9]从轨迹中提取全局和局部运动特征(速度、加速度)进行分类,全局特征从整条轨迹中提取,局部特征从子轨迹中提取。回顾基于交通模式检测的轨迹分类方法,可以发现它们通常只采用速度、加速度这2个运动参数进行分类,且对这2个运动参数大多只利用简单的统计量如平均值、中值、最大值来构建特征,没有充分挖掘出轨迹的特点。
Fig. 1 Animal dataset including three types of animals

图1 3种动物的轨迹数据集

Fig. 2 Hurricane dataset, including scale 2 and scale 3 hurricanes

图2 飓风数据集(包含2、3级飓风)

本文基于文献[9]的全局和局部运动特征分类方法,通过全面总结关于运动特征及其统计量的相关文献,归纳构建了更有区分力和稳定的全局特征,并利用支持向量机(Supporting Vector Machine,SVM)[17]进行轨迹分类,以期提出一种简单、灵活、精确的轨迹分类方法,提出的特征可用于交通模式检测等应用。对速度、加速度、曲率、方向和转角等运动参数,利用偏度系数、峰度系数、变异系数和自相关系数等高级的统计量得到了有区分力的全局运动特征;对方向和转角,引入方向统计学(Directional Statistics)来计算其运动特征。实验利用了3个真实轨迹数据集来评估该方法,结果显示本方法构建的运动特征在不同数据集下有效可行。

2 基于运动特征的轨迹分类

本方法如图3所示,包含3个步骤:(1)轨迹预处理:去除异常、重采样和噪声平滑;(2)全局和局部特征提取;(3)主成分分析(Principal Component Analysis,PCA)[18]降维和SVM分类。步骤(1)、(3)可参考文献[9],本文详述步骤(2)。
Fig. 3 Trajectory classification according to global and local movement characteristics

图3 利用全局和局部运动特征进行轨迹分类

2.1 全局运动特征

移动对象的运动特征如速度、方向、曲率、位移等[19],可揭示移动对象的行为。文献[9]将运动特征分为2个层次:全局特征和局部特征。
为提取全局特征,首先计算出每个采样点或每个轨迹段的运动参数,然后对运动参数计算统计量如均值、中位数、标准差等作为全局特征。文献[9]共采用6个全局特征,对速度、加速度以及曲率分别计算均值和标准差。轨迹可表示为一个点序列 P i P 1 , P 2 , , P n ,每个点 P i 包含位置 x i , y i 和时间 t i ,即 P i = x i , y i , t i 。利用式(1)-(5),可计算出速度( v i ),加速度( a i )和曲率( s i ),其中, dist P i , P i + 1 P i P i + 1 之间的欧氏距离。
d i = dist P i , P i + 1 (1)
t i = t i + 1 - t i (2)
v i = d i / t i (3)
a i = v i + 1 - v i / t i (4)
s i = dist P i - 1 , P i + dist P i , P i + 1 / dist P i - 1 , P i + 1 (5)
本研究的全局运动参数包含5个:速度、加速度、曲率、方向和转角。曲率(sinuosity)为两点之间的移动距离与两点之间的直线距离之比,可揭示路径的弯曲度。方向(direction)和转角(turning angle)如图4所示。方向是连续采样点之间的移动方向,它用方向和基本方向(如北)之间的夹角来表示;转角可通过计算连续方向之间的差异而得到。文献[6]采用式(6)计算转角(文献[6]把方向和转角称为heading和heading change):
p i · turnAng = p i + 1 · direct - p i · direct (6)
式中:direct为方向;turnAng为转角。
Fig. 4 Direction and turning angle

图4 方向和转角

本研究通过方向统计学[20]来计算,采用以下统计量计算上述运动参数特征:均值、中位数、标准差,变异系数、最大的3个数、最小的3个数、自相关系数、偏度系数和峰度系数。变异系数(coefficient of variation)是标准差除以均值的商,是描述数据散布程度的标准度量。最大的3个数来源于文献[7],用最大的3个数,而不是最大值主要是考虑到GPS的定位精度和误差。同样,本研究还包含最小的3个数。偏度(skewness)和峰度(kurtosis)衡量统计分布的形状,偏度衡量分布的不称性,峰度衡量分布的陡峭程度。自相关系数(autocorrelation)是时间序列分析[21]中的概念,衡量不同步长下观测值的相关程度。时间序列可表示为序列 x t x 1 , x 2 , , x N 。为了计算自相关系数,可先用式(7)计算自协方差系数 c k
c k = 1 N t = 1 N - k x t - x ¯ x t + k - x ¯ (7)
式中: k 为时滞;N为序列的长度。
然后,再利用式(8)计算时滞为1的自相关系数 r 1 ,其中c0c1利用式(7)计算得到。
r 1 = c 1 c 0 (8)
速度、加速度、曲率的统计量可通过公式直接计算,但对于方向和转角,计算并不直接(如几个方向的均值不是方向的总和除以方向的数目)需要特别处理。方向统计学[20]是关于平面上单位向量的统计学,其样本空间是一个圆,方向统计学的基本概念和公式如下。
Fig. 5 The direction representation of X using angle θ

图5 通过角θ表示X的方向

方向可以表示为单位向量 X 或单位圆上的点。设定单位圆的基本方向(如东)后,则点 X 可以通过角度 θ 利用式(9)表示,如图5所示:
X = cosθ , sinθ T (9)
(1)方向均值(Mean Direction)
给定单位向量 X 1 , , X n ,以及相应角度 θ i , i = 1 , , n ; θ 1 , , θ n 的方向均值 θ ¯ X 1 + + X n 的合成向量(合力) X ¯ 的方向(合力大小为 R ¯ ); X j 用坐标 cos θ j , sin θ j 表示,其中, j = 1 , , n ; X ¯ 的坐标为 ( C ¯ , S ¯ )
C ¯ = 1 n j = 1 n cos θ j , S ¯ = 1 n j = 1 n sin θ j (10)
R ¯ >0时, R ¯ 通过式(11)计算:
R ¯ = C ¯ 2 + S ¯ 2 1 2 (11)
R ¯ = 0 时, θ ¯ 没有定义,当 R ¯ < 0 时, θ ¯ 通过式(12)计算:
θ ¯ = arctan S ¯ C ¯ if C ¯ 0 arctan S ¯ C ¯ + π if C ¯ < 0 (12)
(2)方向中值(Median Direction)
θ 1 , , θ n 的方向中值 θ ˜ 是满足以下2个条件的角度 ϕ :(1)一半数据点位于弧度区间 ϕ , ϕ + π ;(2)大部分的数据点更接近 ϕ 而不是 ϕ + π 。当样本数 n 是奇数,方向中值是角 θ 1 , , θ n 中的一个,当 n 是偶数时,它是2个相邻角度的中点。
(3)圆周方差(Circular Variance)
为衡量方向的散布,可利用式(13)的圆周方差计算( 0 R ¯ 1 ):
V = 1 - R ¯ (13)
(4)圆周标准差(Circular Standard Deviation)
圆周标准差用式(14)计算,其中, v [ 0 , ) 取值,而 V 0,1 取值。
v = - 2 log 1 - V 1 2 = - 2 log R ¯ 1 2 (14)
由于偏度和峰度的计算较为复杂,由于篇幅所限,此处省略。
(5)两个角度之间的距离(Distance between angles)
式(6)不能直接用来计算转角。如图6所示, P 1 P 2 P 3 是一条轨迹上的3个连续点,基本方向为东。假设 α = 20 ° , β = 330 ° ,利用式(6)来计算转角γ,则 γ = 20 ° - 330 ° = 310 ° γ 应为 20 ° + ( 360 ° - 330 ° ) 。转角代表2个角度之间的距离,因此,可利用式(15)来计算 α β 之间的转角:
1 - cos ( α - β ) (15)
总之,对5个全局运动参数分别计算对应的13个统计量,共有65个全局特征。
Fig. 6 Calculation of the turning angle

图6 转角的计算

2.2 局部运动特征

为从轨迹中提取局部特征,轨迹需要被分割成具有相似运动特征的轨迹段,其中的关键是分割轨迹。本文采用文献[9]的方法分割轨迹并计算局部特征,其将运动参数表示为一个时间序列,反映了运动参数的幅度和频率随时间的变化。幅度表示运动参数的相对大小,而频率表示运动参数变化的快慢。根据幅度和频率的高低,时间序列上的点分为4类:高幅高频、高幅低频、低幅高频和低幅低频。每条轨迹都分割成上述4类具有相似运动特性的轨迹段后,为每段计算统计量。对于4种运动参数(速度、加速度、方向和曲率)和上述4类轨迹段,为每条轨迹计算52个局部特征,包括:(1)对每类轨迹段(4个)以及每个运动参数(4个)计算每段长度的均值和标准差,共32个特征;(2)每个运动参数轨迹段的数目(4个);(3)每个运动参数(4个)的每类轨迹段(4个)所占的百分比,共16个。

3 轨迹分类实验分析

本实验的运行环境:CPU为Intel® Core™ i5-2400 3.10 GHz 4核CPU,内存为4GB,操作系统为Windows® 8.1 64位。利用方向统计学Matlab工具箱CircStat toolbox[22]计算方向和转角的特征,其他特征使用C++计算。用R语言进行分类,其中PCA和SVM分别使用psych(http://cran.r-project.org/web/packages/psych/index.html)和e1071 R(http://cran.r-project.org/web/packages/e1071/index.html)包,e1071使用LIBSVM[23-24]库作为SVM实现。为测试分类精度,本实验采用文献[10]中的轨迹数据集,分别 是船舶(http://www.mbari.org/MUSE/platforms/ships.htm)、野生动物(http://www.fs.fed.us/pnw/starkey/data/tables/index.shtml)和飓风(http://weather.unisys.com/hurricane/atlantic/)数据集。

3.1 船舶数据集

船舶数据集中的轨迹来自2艘船只:R/V Point Lobos(船1)和ROV Ventana(船2),分别作为2类轨迹。两船的GPS采样频率都为10 s,轨迹数目分别是12和15,所有数据集随机选择20%的轨迹作为测试轨迹。
实验将本方法与文献[12]所述方法进行比较。文献[12]的方法为RB-TB,RB-TB使用SVM线性核函数。此外,实验还分析了全局特征、局部特征,及全局和局部组合特征的分类效果,以及SVM线性核(L)和RBF核(R)的影响。本文分别使用G1和L1表示文献[9]的全局和局部特征,G2表示本文方法的全局特征。SVM核函数类型标记在括号中,例如,L1(L)表示文献[9]的局部特征,且通过线性核分类;G2+L1(R)表示结合了本文方法的全局特征和文献[9]的局部特征,且通过RBF核分类。
计算局部特征时需要指定运动参数时间序列曲率的阈值参数,对3个数据集,在0.3到0.9范围之内通过实验分析,发现0.6作为阈值参数较为合理,因此,实验采用0.6作为阈值参数。这与文献[9]的阈值0.95不同,原因可能是数据集的GPS采样频率不同,文献[9]数据集的采样频率是1 s。
表1为利用SVM线性核在不同特征组合下的分类精度。文献[23]指出由于当特征数远大于实例数时,可不用将数据变换到高维空间,线性核与RBF核分类精度相当,因此,对线性核,不用PCA降维,特征直接输入SVM中进行分类。
Tab. 1 Classification accuracy for vessel dataset

表1 船舶数据集的分类精度

特征 RB-TB G1(L) L1(L) G1+L1(L) G2(L)
精度 / (%) 40 80 100 100 100
表1可看出,RB-TB的精度最低,而G1(L)精度为80%,这说明仅通过空间信息进行分类的方法并不可靠。L1(L)和G1+L1(L)的精度均为100%,表明局部特征的有效性。G2(L)的精度也为100%,表明本文方法的全局特征比文献[9]的全局特征更有区分力。由于L1(L)、G+L1(L)和G2(L)的精度已经达到100%,因此,实验没有用RBF核作对比 实验。
下文使用一种前缀和后缀组成的注记来表示全局运动特征,前缀表示统计量,后缀表示运动参数。前缀的含义如表2所示。后缀速度、加速度、方向、转角和曲率分别用V、A、D、TA和S表示,例如AUTOS表示曲率的自相关系数。
Tab. 2 The meanings of prefixes in statistics

表2 统计量前缀的含义

前缀 含义 前缀 含义 前缀 含义
M1 均值 B1 最小值 AUTO 自相关系数
M2 中值 B2 第二小值 S 偏度系数
T1 最大值 B3 第三小值 K 峰度系数
T2 第二大值 STD 标准差
T3 第三大值 CV 变异系数
这些统计量可揭示船舶的不同特征,图7展示了6个特征的箱线图,图7(a)为最大速度,图7(b)为方向自相关系数,图7(c)为转角自相关系数,图7(d)为曲率标准差,图7(e)为曲率变异系数,图7(f)为曲率自相关系数。上述这些特征可完全区分出这两个船舶,其中新的统计量如自相关系数、变异系数可构建有区分力的特征。
Fig. 7 The boxplots of statistics for vessels

图7 船舶数据集运动特征的箱线图

3.2 野生动物数据集

野生动物数据集包含3种野生动物:麋鹿(elk)、鹿(deer)和牛(cattle),采样频率从几十分钟到一或两小时不等,每种动物的轨迹数量是60条。
Tab. 3 Classification accuracy for wild animal dataset

表3 野生动物数据分类精度

特征 精度 / (%) 特征 精度 / (%)
RB-TB 36.67 G1(L) 40
L1(L) 36.67 L1(R) 56.67(21)
G1+L1(L) 40 G1+L1(R) 53.33(9)
G2(L) 80 G2(R) 43.33(18)
G2+L1(L) 80 G2+L1(R) 73.33(18)
表3为利用SVM线性核和RBF核在不同的特征组合下的分类精度。从表3可看出,RB-TB和L1(L)的精度最低,只有36.67%,数据集有3个类别,且3个类别轨迹数目相同,RB-TB和L1(L)的精度只比随机猜测的精度33.33%(1/3)稍高,说明通过空间信息进行分类是不可靠的。G1(L)、L1(L)、L1(R)、G1+L1(L)和G1+L1(R)的精度都低于60%,说明无论是使用线性核或RBF核,文献[9]的全局特征和局部特征的区分力都不高。G2(L)和G2+L1(L)的精度都是80%,远高于文献[9]的方法。但值得注意的是,对于G2和G2+L1,采用RBF核的精度都低于线性核,通过降维带来的信息损失似乎对分类精度的影响较大。不过这与文献[23]所述相符,即“当特征数远大于实例数时,不需要将数据映射到高维空间,使用线性核的精度与RBF核相当”。
图8所示为4个相对有较高区分力的全局特征的箱线图,图8(a)为加速度峰度,图8(b)为转角标准差,图8(c)为曲率自相关系数,图8(d)为曲率峰度。从图8看出,虽然这些特征可展现这3种动物的某些特点,但是没有任何一个单独的特征像船舶数据集那样能把类别完全区分出来,分类器需要组合这些特征来更好地区分3种动物。此外,还可看到峰度系数也能构建有区分力的特征。
Fig. 8 The boxplots of global features for animals

图8 动物数据集全局特征的箱线图

3.3 飓风数据集

飓风数据集使用1950-2012年大西洋飓风数据,采样频率为6 h。实验选取萨菲尔-辛普森飓风等级[25]为2和3的飓风,轨迹数分别为67条和77条。
Tab. 4 Classification accuracy for hurricane dataset

表4 飓风数据集的分类精度

特征 精度 / (%) 特征 精度 / (%)
RB-TB 50 G1(L) 46.43
L1(L) 46.43 L1(R) 71.43
G1+L1(L) 39.29 G1+L1(R) 64.29
G2(L) 53.57 G2(R) 67.86
G2+L1(L) 57.14 G2+L1(R) 71.43
表4为利用SVM线性核和RBF核在不同的特征组合下的分类精度。从表4可以看出,RB-TB的精度为50%,由于两类飓风的数目一样,精度等于随机猜测的结果50%(1/2),说明仅依赖空间信息的分类方法是不可靠的。对于L1,G1+L1,G2和G2+L1特征,使用线性核的精度低于60%,而使用RBF核的精度高于60%:L1和G1+L1特征使用RBF核的分类精度比线性核提高25个百分点;G2和G2+L1特征精度提高14个百分点。L1和G2+L1的精度最高,为71.43%,G2的精度次之,为67.86%。在将G1特征添加到L1后精度下降了7%,而将G2特征添加到L1后,精度保持不变,可见,G2是有区分力且稳定的特征组合。和动物数据集一样,这些特征可揭示两类飓风的某些特性,但单独的特征不能把类别完全区分出来。

3.4 轨迹分类结果分析

本文方法对于船舶、动物和飓风数据集,分类精度分别为100%、80%和71.43%,取得了较为满意的结果,而仅仅基于空间信息的RB-TB方法精度为40%、36.67%和50%。对于这3个数据集,文献[10]报告的精度分别为98.2%,83.3%和73.1%。需要指出的是,虽然本文与Lee使用的是相同的数据集,然而包含在每个数据集中的轨迹却是不同的,文献[10]使用的似乎是数据集中的一个子集。
对于船舶、动物和飓风数据集,精度分别达到了100%、80%和71.43%。精度呈降序排列,本文认为这与轨迹的采样频率有关,其中,3个数据集采样频率分别为10 s,几十分钟至一两个小时和6 h。随着采样频率的增大,计算出的统计信息将变得更加准确,特别是速度、加速度等物理量。以飓风数据集为例,萨菲尔-辛普森飓风等级是持续1 min的平均风速的最大值[25]。若飓风的采样频率为1 min,则可对飓风准确分类。因此,采样间隔越小,分类精度越高。
由于全局和局部特征数量较多,且不同类型的轨迹分类,特征的重要性会有差异(如区分行人和汽车,一般采用速度即可区分),因此,特征选取对轨迹的分类精度有重要影响。由于特征数目较多且其中许多特征可能具有相关性,因此,本研究采用PCA进行降维。对于特征选取问题,如果有领域专家知识(如萨菲尔-辛普森飓风等级按照持续 1 min的平均风速最大值来分类),则可以利用领域专家知识来辅助选取特征,如果没有领域专家知识,则可采用探索性数据分析方法[26]、方差分析等方法来选取特征。实验分析中的箱线图展示了本文选取特征的有效性,在实际应用中,可使用箱线图作为探索性数据分析工具来辅助特征选取。针对具体的轨迹类型,利用领域专家知识或其他特征选取方法,从本文提出的众多特征中合理选择特征进行分类,才能获得最好的分类效果。

4 结论

针对当前大多数轨迹分类方法只考虑速度和加速度这2个运动参数,并且只利用简单的统计量如均值、中值、最大值,而不能充分挖掘轨迹潜在特点的问题,本文提出一种简单、灵活、精确的全局和局部运动特征的轨迹分类方法。针对速度、加速度、曲率、方向和转角这5个运动参数,利用偏度系数、峰度系数、变异系数和时间序列分析中的自相关系数等统计量,得到了有区分力的全局运动特征;对于方向和转角,引入方向统计学来精确计算其运动特征。实验分析了本方法在3个真实轨迹数据集上的分类精度,在船舶、动物和飓风数据集下,精度达到了100%、80%和71.43%,本方法构建的运动特征,在不同数据集下有效可行。结果还表明,全局和局部特征相结合并非总能产生最有区分力的特征,针对不同的数据集,选择全局特征或局部特征可能比两者相结合取得更好的分类精度。
下一步将使用本文提出的特征进行交通模式检测,研究从众多特征中选取可区分交通模式的特征,以提高分类精度。

The authors have declared that no competing interests exist.

[1]
Han J, Kamber M, Pei J.数据挖掘概念与技术[M].范明,孟小峰,译3版.北京:机械工业出版社,2012.

[ Han J, Kamber M, Pei J.Data mining: concepts and techniques(3rd ed.)[M]. Burlington: Morgan Kaufmann, 2011. ]

[2]
Li X, Han J, Kim S, et al.ROAM: Rule- and motif-based anomaly detection in massive moving object data sets[C]. Proceedings of the 2007 SIAM International Conference on Data Mining, Minneapolis, Minnesota, USA, April 26-27, 2007:273-284.

[3]
Greidanus H, Kourti N.Findings of the DECLIMS project—Detection and classification of marine traffic from space[C]. Proceedings of Advances in SAR Oceanography from Envisat and ERS Missions (SEASAR 2006), Frascati, Italy, January, 2006:23-26.

[4]
王冬根,孙冰夏,宋璟璐.利用被动式GPS数据的交通行为信息提取方法:发展现状及趋势[J].武汉大学学报(信息科学版),2014,39(6):671-681.被动式GPS技术能最大程度地减轻受访者的负担,因而成为收集个 体交通行为信息的理想方法.但是被动式GPS技术仅能提供交通行为的时空、移动速度等信息,而无法直接获取个人出行和活动的详细信息,如出行的起始时间、 出行目的、交通方式、活动起始时间、时长、同伴等.如何从GPS轨迹数据中准确获取这些关键的交通行为信息成为被动式GPS应用于个体时空间行为数据采集 的难点所在.对该研究领域的发展现状和趋势进行了总体回顾和分析.首先,追溯了交通行为数据收集方法的发展历程,并重点概述近年来基于GPS原始数据的后 续处理方法的发展现状.然后逐一分析和讨论了现有的基于被动式GPS数据的交通行为信息提取方法的优缺点和存在的问题.最后,对该领域的前景和潜在的研究 问题以及对运用现代信息与通信技术(ICT)收集交通行为信息方法的发展方向提出了相关建议.

DOI

[ Wang D G, Sun B X, Song J L.Methods for detecting acitvity-travel behavior information from passive GPS Data: State-of-the-Art[J]. Geomatics and Information Science of Wuhan University, 2014,39(6):671-681. ]

[5]
张治华. 基于GPS轨迹的出行信息提取研究[D].上海:华东师范大学,2010.

[ Zhang Z H.Deriving Trip Information from GPS Trajectories[D]. Shanghai: East China Normal University, 2010. ]

[6]
Zheng Y, Li Q, Chen Y, et al.Understanding mobility based on GPS data[C]. Proceedings of the 10th international conference on Ubiquitous computing (UbiComp '08), Seoul, Korea, Sep 21-24, 2008:312-321.

[7]
Zheng Y, Liu L, Wang L, et al.Learning transportation mode from raw GPS data for geographic applications on the web[C]. Proceedings of the 17th international conference on World Wide Web (WWW '08), Beijing, China, April 21-25, 2008:247-256.

[8]
Biljecki F, Ledoux H, Oosterom P.Transportation mode-based segmentation and classification of movement trajectories[J]. International Journal of Geographical Information Science, 2013,27(2):385-407.The knowledge of the transportation mode used by humans (e.g. bicycle, on foot, car and train) is critical for travel behaviour research, transport planning and traffic management. Nowadays, new technologies such as the Global Positioning System have replaced traditional survey methods (paper diaries, telephone) because they are more accurate and problems such as under reporting are avoided. However, although the movement data collected (timestamped positions in digital form) have generally high accuracy, they do not contain the transportation mode. We present in this article a new method for segmenting movement data into single-mode segments and for classifying them according to the transportation mode used. Our fully automatic method differs from previous attempts for five reasons: (1) it relies on fuzzy concepts found in expert systems, that is membership functions and certainty factors; (2) it uses OpenStreetMap data to help the segmentation and classification process; (3) we can distinguish between 10 transportation modes (including between tram, bus and car) and propose a hierarchy; (4) it handles data with signal shortages and noise, and other real-life situations; (5) in our implementation, there is a separation between the reasoning and the knowledge, so that users can easily modify the parameters used and add new transportation modes. We have implemented the method and tested it with a 17-million point data set collected in the Netherlands and elsewhere in Europe. The accuracy of the classification with the developed prototype, determined with the comparison of the classified results with the reference data derived from manual classification, is 91.6%.

DOI

[9]
Dodge S, Weibel R, Forootan E.Revealing the physics of movement: Comparing the similarity of movement characteristics of different types of moving objects[J]. Computers, Environment and Urban Systems, 2009,33(6):419-434.We propose a segmentation and feature extraction method for trajectories of moving objects. The methodology consists of three stages: trajectory data preparation; global descriptors computation; and local feature extraction. The key element is an algorithm that decomposes the profiles generated for different movement parameters (velocity, acceleration, etc.) using variations in sinuosity and deviation from the median line. Hence, the methodology enables the extraction of local movement features in addition to global ones that are essential for modeling and analyzing moving objects in applications such as trajectory classification, simulation and extraction of movement patterns. As a case study, we show how the method can be employed in classifying trajectory data generated by unknown moving objects and assigning them to known types of moving objects, whose movement characteristics have been previously learned. We have conducted a series of experiments that provide evidence about the similarities and differences that exist among different types of moving objects. The experiments show that the methodology can be successfully applied in automatic transport mode detection. It is also shown that eye-movement data cannot be successfully used as a proxy of full-body movement of humans, or vehicles.

DOI

[10]
Bolbol A, Cheng T, Tsapakis I, et al.Inferring hybrid transportation modes from sparse GPS data using a moving window SVM classification[J]. Computers, Environment and Urban Systems, 2012,36(6):526-537.ABSTRACT Understanding travel behaviour and travel demand is of constant importance to transportation communities and agencies in every country. Nowadays, attempts have been made to automatically infer transportation modes from positional data, such as the data collected by using GPS devices so that the cost in time and budget of conventional travel diary survey could be significantly reduced. Some limitations, however, exist in the literature, in aspects of data collection (sample size selected, duration of study, granularity of data), selection of variables (or combination of variables), and method of inference (the number of transportation modes to be used in the learning). This paper therefore, attempts to fully understand these aspects in the process of inference. We aim to solve a classification problem of GPS data into different transportation modes (car, walk, cycle, underground, train and bus). We first study the variables that could contribute positively to this classification, and statistically quantify their discriminatory power. We then introduce a novel approach to carry out this inference using a framework based on Support Vector Machines (SVMs) classification. The framework was tested using coarse-grained GPS data, which has been avoided in previous studies, achieving a promising accuracy of 88% with a Kappa statistic reflecting almost perfect agreement.

DOI

[11]
Sun Z, Ban X J.Vehicle classification using GPS data[J]. Transportation Research Part C: Emerging Technologies, 2013,37:102-117.Vehicle classification information is crucial to transportation planning, facility design, and operations. Traditional vehicle classification methods are either too expensive to be deployed for large areas or subject to errors under specific situations. In this paper, we propose methods to classify vehicles using GPS data extracted from mobile traffic sensors, which is considered to be low-cost especially for large areas of urban arterials. It is found that features related to the variations of accelerations and decelerations (e.g., the proportions of accelerations and decelerations larger than 1meter per square second, and the standard deviations of accelerations and decelerations) are the most effective in terms of vehicle classification using GPS data. By classifying general trucks from passenger cars, the average misclassification rate is about 1.6% for the training data, and 4.2% for the testing data.

DOI

[12]
Bashir F I, Khokhar A A, Schonfeld D.Object trajectory-based activity classification and recognition using hidden Markov models[J]. IEEE Transactions on Image Processing, 2007,16(7):1912-1919.Motion trajectories provide rich spatiotemporal information about an object's activity. This paper presents novel classification algorithms for recognizing object activity using object motion trajectory. In the proposed classification system, trajectories are segmented at points of change in curvature, and the subtrajectories are represented by their principal component analysis (PCA) coefficients. We first present a framework to robustly estimate the multivariate probability density function based on PCA coefficients of the subtrajectories using Gaussian mixture models (GMMs). We show that GMM-based modeling alone cannot capture the temporal relations and ordering between underlying entities. To address this issue, we use hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology (e.g., left-right versus ergodic). Experiments using a database of over 5700 complex trajectories (obtained from UCI-KDD data archives and Columbia University Multimedia Group) subdivided into 85 different classes demonstrate the superiority of our proposed HMM-based scheme using PCA coefficients of subtrajectories in comparison with other techniques in the literature.

DOI PMID

[13]
Lee J, Han J, Li X, et al.TraClass: trajectory classification using hierarchical region-based and trajectory-based clustering[J]. Proceedings of the VLDB Endowment, 2008,1(1):1081-1094.Trajectory classification, i.e., model construction for predicting the class labels of moving objects based on their trajectories and other features, has many important, real-world applications. A number of methods have been reported in the literature, but due to using the shapes of whole trajectories for classification, they have limited classification capability when discriminative features appear at parts of trajectories or are not relevant to the shapes of trajectories. These situations are often observed in long trajectories spreading over large geographic areas. Since an essential task for effective classification is generating discriminative features, a feature generation framework TraClass for trajectory data is proposed in this paper, which generates a hierarchy of features by partitioning trajectories and exploring two types of clustering: (1) region-based and (2) trajectory-based. The former captures the higher-level region-based features without using movement patterns, whereas the latter captures the lower-level trajectory-based features using movement patterns. The proposed framework overcomes the limitations of the previous studies because trajectory partitioning makes discriminative parts of trajectories identifiable, and the two types of clustering collaborate to find features of both regions and sub-trajectories. Experimental results demonstrate that TraClass generates high-quality features and achieves high classification accuracy from real trajectory data. 1.

DOI

[14]
Patel D, Sheng C, Hsu W, et al.Incorporating duration information for trajectory classification[C]. Proceedings of IEEE 28th International Conference on Data Engineering (ICDE '12), Washington, DC, USA, April 1-5, 2012:1132-1143.

[15]
Lee J, Han J, Li X, et al.Mining discriminative patterns for classifying trajectories on road networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2011,23(5):713-726.Classification has been used for modeling many kinds of data sets, including sets of items, text documents, graphs, and networks. However, there is a lack of study on a new kind of data, trajectories on road networks. Modeling such data is useful with the emerging GPS and RFID technologies and is important for effective transportation and traffic planning. In this work, we study methods for classifying trajectories on road networks. By analyzing the behavior of trajectories on road networks, we observe that, in addition to the locations where vehicles have visited, the order of these visited locations is crucial for improving classification accuracy. Based on our analysis, we contend that (frequent) sequential patterns are good feature candidates since they preserve this order information. Furthermore, when mining sequential patterns, we propose to confine the length of sequential patterns to ensure high efficiency. Compared with closed sequential patterns, these partial (i.e., length-confined) sequential patterns allow us to significantly improve efficiency almost without losing accuracy. In this paper, we present a framework for frequent pattern-based classification for trajectories on road networks. Our comparative study over a broad range of classification approaches demonstrates that our method significantly improves accuracy over other methods in some synthetic and real trajectory data.

DOI

[16]
Schuessler N, Axhausen K W. Processing GPS raw data without additional information [R/OL]. [2008-08-27]. .

[17]
Vapnik V N.The nature of statistical learning theory (2nd eds.)[M]. New York: Springer, 2000.

[18]
Jolliffe I T.Principal component analysis (2nd ed.)[M]. New York: Springer, 2002.

[19]
Laube P, Dennis T, Forer P, et al.Movement beyond the snapshot -dynamic analysis of geospatial lifelines[J]. Computers, Environment and Urban Systems, 2007,31(5):481-501.

[20]
Mardia K V, Jupp P E.Directional statistics[M]. Chichester UK: John Wiley & Sons, 2000:13-23.

[21]
Chatfield C.The analysis of time series: an introduction (5th ed.)[M]. London: Chapman & Hall/CRC, 1996:18-20.

[22]
Berens P.CircStat: A MATLAB toolbox for circular statistics[J]. Journal of Statistical Software, 2009,31(10):1-21.ABSTRACT Directional data is ubiquitious in science. Due to its circular nature such data cannot be analyzed with commonly used statistical techniques. Despite the rapid development of specialized methods for directional statistics over the last fifty years, there is only little software available that makes such methods easy to use for practioners. Most importantly, one of the most commonly used programming languages in biosciences, MATLAB , is currently not supporting directional statistics. To remedy this situation, we have implemented the CircStat toolbox for MATLAB which provides methods for the descriptive and inferential statistical analysis of directional data. We cover the statistical background of the available methods and describe how to apply them to data. Finally, we analyze a dataset from neurophysiology to demonstrate the capabilities of the CircStat toolbox.

DOI

[23]
Hsu C, Chang C, Lin C, et al. A practical guide to support vector classification[R/OL]. [2010-04-15]. .

[24]
Chang C, Lin C.LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology(TIST), 2011,2(3):1-39.LIBSVM is a library for support vector machines (SVM). Its goal is to help users toeasily use SVM as a tool. In this document, we present all its implementation details.For the use of LIBSVM, the README file included in the package & the LIBSVM FAQprovide the information.

DOI

[25]
Elsner J B, Kara A B.Hurricanes of the North Atlantic: climate and society[M]. New York: Oxford University Press, 1999,21-24.

[26]
Tukey W J.Exploratory data analysis[M]. Boston: Addison-Wesley, 1977:39-43.

文章导航

/