Detecting Individual Stay Areas from Mobile Phone Location Data Based on Moving Windows

LIN Nan; YIN Ling; ZHAO Zhiyuan

doi:10.12082/dqxxkx.2018.180087

Journal of Geo-information Science >

2018 , Vol. 20 >Issue 6: 762 - 771

DOI: https://doi.org/10.12082/dqxxkx.2018.180087

Detecting Individual Stay Areas from Mobile Phone Location Data Based on Moving Windows

LIN Nan ^,¹^,² ,
YIN Ling ^,¹^,^* ,
ZHAO Zhiyuan ¹^,³

Expand

1. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
3. State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing of Wuhan University, Wuhan 430079, China

*Corresponding author: YIN Ling, E-mail: yinling@siat.ac.cn

Received date: 2018-01-31

Request revised date: 2018-03-09

Online published: 2018-06-20

Supported by

National Natural Science Foundation of China, No.41771441

Basic Research Project of Shenzhen City, No.JCYJ20170307164104491

Natural Science Foundation of Guangdong Province, No.2016A050503035

Copyright

《地球信息科学学报》编辑部所有

Fold

Abstract

With the development and popularization of mobile phones, mobile phone location data have become an important source of data for analyzing individual mobility characteristics. With these location data, many studies can be performed at a fine spatiotemporal scale in fields such as population management, urban planning, transportation analysis and health intervention. Detection of individual stay areas is an important and basic step in many studies based on mobile phone location data. However, the sparse spatial and temporal resolution of raw mobile phone location data and data noise caused by location oscillation and location drift introduce great challenges in effectively detecting individual stay areas from raw mobile phone location data. Considering the spatiotemporal continuity of individual behavior, this study proposes an incremental clustering algorithm based on a moving window to improve the accuracy of detecting individual stay areas from mobile phone location data. Specifically, the proposed algorithm first sorts the raw records in chronological order. Then, the algorithm consecutively examines the adjacent records with a given distance threshold. Records that satisfy the rule will be added to the current cluster. For each unqualified record, the algorithm extracts a series of records within a moving window and calculates the spatial distance of these records as a criterion for clustering. The time interval between the unqualified record and the selected records should be less than a given time threshold, which is also the width of the moving window in this proposed algorithm. In this step, the algorithm treats some unqualified records as location drift records or location oscillation records based on the detection rules and aggregates them into the current cluster, and unqualified records that do not fit the detection rules are excluded from the current cluster and the algorithm creates a new cluster for the unqualified records. Finally, the algorithm calculates the location and temporal information of each valid cluster as the parameters of the corresponding stay area and constructs a stay area sequence for each mobile user. We compared the results of the proposed algorithm with those obtained using the ST-DBSCAN and SMoT algorithms. The experiment applied the three algorithms to a mobile phone location dataset in Shenzhen that is a type of Call Detail Records, and the results show that the proposed algorithm significantly improves the accuracy by up to 35% for detecting individual stay areas from sparse mobile phone location data compared to the other two algorithms. Due to privacy issues associated with the government or telecom operators, the temporal resolution of large-scale mobile phone location data used in recent research is usually sparse, and thus the proposed algorithm can be used to improve the effectiveness of detecting individual stay areas and to provide reliable results for many studies based on mobile phone location data.

Key words： Mobile phone location data; data noise; trajectory analysis; incremental clustering; stay areas detection

Cite this article

LIN Nan , YIN Ling , ZHAO Zhiyuan . Detecting Individual Stay Areas from Mobile Phone Location Data Based on Moving Windows[J]. Journal of Geo-information Science, 2018 , 20(6) : 762 -771 . DOI: 10.12082/dqxxkx.2018.180087

1 引言

近年来手机的普及为大规模手机定位数据的积累和应用奠定了基础^[1]。作为记录个体移动的轨迹数据,手机定位数据具有样本量大、实时更新、采样时间长、采集成本低等优势^[2,3]。通过手机定位数据感知个体、人群、社会、城市及区域的诸多特征和变化逐渐被学界和产业界广泛尝试并应用于实践^[4,5,6],尤其在人口管理^[7,8]、城市规划^[9,10]、交通分析^[11,12]及流行病防控^[13,14,15]等领域得到日益深入而广泛的应用。

移动和停留是轨迹记录点的2种基本状态。在不同的时空尺度中,轨迹中的停留往往意味着个体在持续时间内、局部空间中从事某项活动^[16],是轨迹中蕴含丰富价值的部分。因此,从手机定位数据中识别个体停留区域是众多基于手机定位数据的相关研究的重要基础环节。

相较GPS数据等时空分辨率较高的轨迹数据源^{[17,18,19,20]},当前广泛使用的手机定位数据的时空分辨率往往较低,采样时间间隔大多超过30 min,定位精度则在数百米到上千米范围内变化^[21,22]。此外,由于“乒乓效应”等客观因素,手机定位数据中还存在震荡点和漂移点这两种常见的数据噪声源^[23,24],使得原始数据难以直接对应个体在现实空间中所处的停留区域,从而给从手机定位数据中识别个体停留区域的工作带来了挑战。如何减少这些数据噪声的影响,提高从手机定位数据中识别停留区域的准确性,不仅是面向手机定位数据这种新兴数据源的基本问题,而且是影响众多基于手机定位数据的研究和应用的重要基础问题。

近年来学界为解决这一问题开展了丰富的研究。从聚类的角度,现有停留区域识别算法主要分为密度聚类算法和增长聚类算法2种。密度聚类算法通过密度相邻等数据特征从有噪声的空间数据中识别高密度区域,进而实现停留区域识别。这类算法以DBSCAN算法为代表^[25],Birant等^[26]和Palma等^[27]在此基础上改进算法并对时空数据进行了聚类。但由于城市中基站空间分布的不均匀特征^[28],手机定位数据记录点的空间密度并不均匀,从而限制了此类算法的表现。增长聚类算法在一定程度上解决了数据空间分布不均匀的问题。这类算法通过将一定时间阈值内空间邻近的连续轨迹记录点聚集在一起形成停留区域。增长聚类算法以SMoT算法为代表^[29],先后被Horn等^[30]、Kang^[31]和Widhalm等^[32]完善并分别应用在GPS和手机定位数据集中。但SMoT算法往往无法结合记录点的上下文联系识别个体停留区域,震荡点等数据噪声易使算法将一个完整的停留行为识别成多个间断的停留行为,虚增个体停留区域。因此,如何克服数据噪声干扰,改善从手机定位数据中识别停留区域的准确率,是SMoT算法在识别手机定位数据中的个体停留区域时需要克服的一个难点。

本研究结合个体活动的时空连续性特征,提出一种基于滑动窗口的增长聚类算法,旨在更加准确地从手机定位数据中提取个体停留区域,为众多基于手机定位数据的研究提供关键技术支撑。

2 基于滑动窗口的停留区域识别算法

基于个体活动的时空连续性,本研究提出一种基于滑动窗口的增长聚类算法,通过滑动窗口选取时间上相邻的手机定位数据记录点,并根据这些记录点的空间邻近关系进行聚类判断,进而实现个体停留区域识别。

2.1 基本定义

本研究将手机定位数据的原始记录点称为观察点p=<userID, TS, Lng, Lat>,四元组中记录了用户匿名编号userID、用户被定位时的时间戳TS及其连接基站的经度Lng与纬度Lat。定义符合聚类规则的观察点p的集合为观察点簇C={p₁, p₂, …, p_m, …}, ∀m∈Z⁺, 1≤m≤M,M代表簇中观察点的数量,p_m是簇中的第m个观察点（图1）。

View original graphic|Download|PPT slide

Fig. 1 The sketch of detecting individual stay areas from raw mobile phone location data

图1 手机定位数据中的个体停留区域示意图

为了便于理解,本研究将满足一定条件的点簇C,如点簇对应的累计记录时间超过一定阈值,进一步抽象成对应的停留区域S=<userID, Lng, Lat, arrT, levT >。定义停留区域的经纬度分别为观察点簇C中所有观察点的经纬度均值：

S . Lng = ∑ m = 1 M p m ∈ C . Lng M

S . Lat = ∑ m = 1 M p m ∈ C . Lat M

。定义停留区域的抵达时间S.arrT为观察点簇C中最早的时间戳： S.arrT=Min(p_m_∈_C.TS),离开时间S.levT为观察点簇C中最晚的时间戳：S.levT=Max(p_m_∈_C.TS)。依时间先后串联个体的停留区域,形成停留区域序列T={S₁, S₂, …, S_n, …}, ∀n∈Z⁺, 1≤n≤N,其中N代表停留区域数量,n代表第n个停留区域。

2.2 聚类规则

本研究采用增长聚类的方法,将个体的观察点依记录时间先后顺序进行排序后,依次通过基本空间约束、震荡点识别、漂移点识别3条聚类规则对观察点进行聚类。

2.2.1 基本空间约束

基本空间约束判断观察点p_m与当前观察点簇中C_n中的点间距是否小于设定的距离阈值ε,符合约束则将此2点划分到同一个观察点簇C_n中,n代表第n个观察点簇。这一规则的形式化表达如式（1）所示。

Distance p i ∈ C n, p m ≤ ε

（1）

2.2.2 震荡点识别

震荡点在时空中往往表现为一段时间内观察点在多个固定的邻近位置间往复。因此这段时间内的一系列观察点的距离矩阵往往较为稀疏。震荡点识别规则定义一个以观察点p_m为中心的滑动窗口提取这一系列观察点,窗口内任意点与点p_m的时间差在时间δ内。计算窗口内任意点p_i和p_j间的距离形成距离矩阵。如果距离矩阵的均值小于距离阈值ε,则点p_m将作为震荡点划分到当前的活动观察点簇C_n中。这一规则的形式化表达如式（2）所示。

Mea n Distance p i, p j ≤ ε

（2）

式中：

i ≠ j; p i . TS, p j . TS ∈ p m . TS - δ, p m . TS + δ

。

2.2.3 漂移点识别

漂移点与窗口内的观察点具有较大的空间距离。漂移点识别规则统计滑动窗口内其他观察点到当前观察点p_m之间的距离,如果大于距离阈值ε的观察点的数量占滑动窗口内观察点总数的比例多于设定的表决阈值ξ,则意味着观察点p_m与其他观察点存在显著空间差异,算法将该点视为漂移点。顾及观察点的时间连续性,本研究将其位置信息修改为当前点簇的均值并划分到当前点簇中。这一规则的形式化表达如式（3）所示。

# Distance p m, p i > ε # (p j) > ξ

（3）

式中：i

≠ m; p i . TS, p j . TS ∈ p m . TS - δ, p m . TS + δ; ξ ∈ 0, 1

。

2.3 整体算法流程

如图2所示,本研究提出的停留区域识别算法流程为：① 根据记录时间先后顺序将观察点排序,并初始化观察点簇C₁和空活动序列T。② 依次计算连续观察点间的距离并判断其是否符合基本空间约束,满足条件的观察点将被添加进当前观察点簇。③ 对于不满足基本空间约束的观察点,将根据滑动窗口提取其相邻时间内的观察点,并通过震荡点识别规则判断其是否为震荡点,满足条件的观察点将被添加进当前观察点簇。④ 对于不满足震荡点识别规则的点,则进一步通过漂移点识别规则判断其是否为漂移点,满足条件的观察点将被添加进当前观察点簇,不满足的观察点则被添加进新的点簇中。任意点簇中包含连续2个或以上观察点时,本研究即认为该点簇可对应一个停留区域,并根据点簇中的观察点信息计算出停留区域的属性。算法依上述步骤遍历每一个体的观察点并识别停留区域,构建个体停留区域序列。

View original graphic|Download|PPT slide

Fig. 2 The flow chart of detecting individual stay areas from raw mobile phone location data

图2 个体停留区域识别流程图

3 案例分析

3.1 数据源介绍

本研究的研究区域为深圳市。本研究使用的手机定位数据为深圳市某运营商在2013年某工作日中约140万个匿名用户的通讯位置数据,数据的平均采样时间间隔约为30 min。该数据集记录了用户发生通讯时的用户编号、时间戳和连接的基站位置等信息。该数据集中共含有3675个基站位置,基站间距离的中位数约为550 m。数据格式如表1所示。

Tab. 1 Format of the mobile phone location dataset

表1 手机定位数据格式

userID	TS	经度/°	纬度/°
460********9251	2013--T 0:01:23.000Z	114.****	22.****
460********2565	2013--T 07:07:55.000Z	114.****	22.****
460********3757	2013--T 10:14:11.000Z	114.****	22.****

3.2 算法验证体系

3.2.1 验证数据

验证算法准确性需要个体停留区域的标注数据。然而由于隐私保护限制,手机定位数据难以获取此项信息。目前众多基于手机定位数据的研究也无法实现对“真实”标注信息的验证。为了推进对算法准确性的验证工作,人工标注了实验数据集中的停留区域。高频采样数据能够较为完整地反映用户时空活动特征,且手机定位数据的噪声现象在高频采样情况下更为显著,因此人工识别高频采样数据中的停留区域较为可靠。本研究首先从原始手机定位数据中筛选了329个高频采样的用户数据作为本研究实验数据集。具体筛选规则为：① 最晚记录与最早记录时间差大于等于16 h;② 平均采样间隔为5 min;③ 相邻记录的最大时间差不大于30 min。人工标注则通过2个原则尽可能标注的正确性：① 若连续观察点在2个以上邻近的固定位置间来回切换,则将这些连续的观察点视为同一停留区域;② 若某个观察点在空间中显著异于相邻观察点,则将该点划分至相邻观察点所属的停留区域。

图3为本研究人工标注过程中的示例。结合观察点的时空分布特征,黄色圈内的点与相邻观察点共同形成了来回切换的现象,本研究将其作为震荡点纳入停留区域A;红色圈内的点显著异于相邻观察点,本研究将其视为漂移点并纳入停留区域B。停留区域内的点将被标注为停留,非停留区域的点则不予标注。

View original graphic|Download|PPT slide

Fig. 3 Manual annotation of individual stay areas based on mobile phone location data

图3 基于手机定位数据人工标注个体停留区域的示意图

3.2.2 评价指标

本研究以准确率和召回率作为算法表现的评价指标。停留区域识别结果的正确与否主要依赖于以下3个方面：① 识别的停留区域和标注的停留区域数量是否一致;② 识别的停留区域与标注数据在空间上是否一致;③ 识别的停留区域的起止时间段和标注数据是否一致。据此本研究制定指标计算方式如下：

准确率 : Accuracy = S C correct S C algorithm

（4）

召回率 : Recall = S C correct S C label

（5）

式中：

S C label

为标注的停留区域个数;

S C algorithm

为算法识别的停留区域个数;

S C correct

为算法判断正确的停留区域个数：

S C correct =

$\sum\limits_{i=1}\sum\limits_{j=1}$

Correct S li, S rj

（6）

Correct (S li, S rj) = 1, SO S li, S rj = 1 and Matc h S li, S rj = 1 0, SO S li, S rj = 0 and Matc h S li, S rj = 1

（7）

式中：

S li

代表标注数据中第i个停留区域;

S rj

代表识别结果中第j个停留区域;SO为判断识别结果和标注数据在空间上是否邻近的函数。

SO S li, S rj = 1, Dis tan ce S li, S rj ≤ D 0 0, Distance S li, S rj > D 0

（8）

D 0

为距离阈值。考虑基站定位的空间精度,参考基站间距离的中位数,本研究令阈值为500 min。Match为判断停留区域是否匹配的函数。

Matc h S li, S rj = 1, ∑ i=1 TO S li, S rj = 1 and ∑ j=1 TO S li, S rj = 1 0, 否则

（9）

TO S li, S rj = 1, min S li . levT, S rj . levT ≥ max S li . arrT, S rj . arrT 0, min S li . levT, S rj . levT < max S li . arrT, S rj . arrT

（10）

TO为判断停留区域的起止时间是否重合的函数。在指标计算过程中,遍历每个用户的停留区域,根据上述函数判断标注数据和算法识别的停留区域的时空信息是否匹配,并计算准确率和召回率。

3.2.3 验证实验

本研究对实验数据集进行了时间维度的重采样,分别形成了15、30、60 min间隔的数据集,用以系统考察算法在不同采样时间间隔下的识别效果。具体地,本研究将24 h以指定时间间隔（如15 min）等分,从原始实验数据集中提取每一等分片段中记录时间居中的记录,使得重采样数据集的时间分辨率与设定值近似并尽可能保证原始实验数据的时间特征。为了方便比较实验结果,本研究定义5 min和15 min间隔的数据集为高频采样数据集,30 min和60 min间隔的数据集为低频采样数据集。

3.3 参数设置

本研究首先探讨了滑动窗口宽度

δ

和表决阈值

ξ

的参数设置问题。考虑基站定位的空间分辨率,参考基站间距离中位数,本研究设置距离阈值

ε

为500 m。调整滑动窗口宽度从1-5 h之间变化（以1 h为单位）,调整表决阈值从0到1之间变化（以

δ 采样间隔

为单位）,算法结果如图4、5所示。

View original graphic|Download|PPT slide

Fig. 4 The accuracy changes of TW-cluster algorithm with different δ and ξ

图4 算法准确率随不同滑动窗口宽度及表决阈值的变化情况

View original graphic|Download|PPT slide

Fig. 5 The recall changes of TW-cluster algorithm with different δ and ξ

图5 算法召回率随不同滑动窗口宽度及表决阈值的变化情况

如图4、5所示,当滑动窗口宽度

δ

从1 h增加到5 h时,对于同一表决阈值

ξ

,算法识别准确率和召回率随着

δ

的增大出现小幅度降低,尤其在低频采样下稍为明显。滑动窗口的设置考察了观察点在时间上的前后联系,对于时间间距过远的观察点,其本身可能归属于另一停留区域,对判断起的可能是反作用。从图4、5可看出,滑动窗口宽度设置为1 h的时候,算法拥有较好的表现。

图4显示,在不同采样间隔、同一滑动窗口宽度

δ

下,算法准确率受表决阈值

ξ

影响有所差异：高频采样下准确率随着

ξ

增加先增后减,低频采样间隔下准确率则随着

ξ

增加而增加。图5显示,算法召回率在不同采样间隔下均随着

ξ

增加而增加。该结果反映了不同采样间隔下等宽滑动窗口内漂移点数量的多寡给准确率带来的影响。在低频采样间隔下,等宽滑动窗口内的漂移点数量相对较少,较高的阈值能够保证观察点与足够多相邻观察点存在空间差异,提高判断准确率。而在高频采样间隔下,等宽滑动窗口内的漂移点记录较多,过高的阈值反而易使算法将其他停留区域的观察点错判为漂移点,降低算法准确率。

综合参数设置的实验结果,算法在窗口为1 h时有较好的表现;表决阈值的增加能够提高算法识别准确率和召回率,但在高频采样间隔下,这一阈值不宜设置过高。

3.4 算法比较

3.4.1 算法选取与参数设置

本研究选取了当前广泛使用的密度聚类代表算法（ST-DBSCAN算法^[26]）和增长聚类代表算法（SMoT算法^[29]）进行对比,探讨3种算法在不同距离阈值下的表现差异。本研究令距离阈值在100~2000 m之间以100 m为步长递增。设置ST-DBSCAN算法中时间邻域为数据集对应的采样时间间隔,近邻点参数

MinPts = ln (K)

,K为个体观察点数量^[25];SMoT算法中停留时间阈值设置为1 h;本研究提出的算法中,滑动窗口聚类算法中窗口宽度

δ = 1 h

,表决阈值

ξ = 1

。算法识别结果如图6、7所示。

View original graphic|Download|PPT slide

Fig. 6 The accuracy of different clustering algorithms with different sampling intervals

图6 不同采样间隔下的算法准确率比较

View original graphic|Download|PPT slide

Fig. 7 The recall of different clustering algorithms with different sampling intervals

图7 不同采样间隔下的算法召回率比较

3.4.2 不同算法的总体规律

3种算法的效果在整体趋势较为相似。图6和图7显示3种算法准确率在不同采样时间间隔下的主要趋势是随着距离阈值增加呈现先快速增加后小幅上升或下降等趋势,而召回率的主要趋势是随着距离阈值增加而减少;在低频采样间隔下,滑动窗口聚类算法的准确率随距离阈值增加而先增后减的趋势更为显著。这一现象说明了距离阈值设置对数据噪声识别的影响。由于基站的空间定位精度较低,距离阈值的增加使得算法能够识别数据中的噪声点,在一定程度上提高停留区域的识别准确率。当距离阈值超过一定值时,易将个体短途出行误判为震荡点,从而降低算法的召回率和准确率。

此外,受噪声点数量影响,距离阈值相同时, 3种算法在低频采样间隔下的准确率均高于高频采样间隔下的准确率。高频采样间隔下的轨迹中相应地拥有更多噪声点,算法更容易将完整的停留片段识别成多个间断的停留片段,导致准确率降低;低频采样间隔下的噪声点则相应减少,识别的停留片段较为完整,算法准确率会有所提升。

3.4.3 不同算法的优劣势

综合3种算法在不同采样时间间隔下的表现可以发现,本研究提出的滑动窗口聚类算法在低频采样时间间隔下拥有更好的识别准确率。以60 min采样间隔为例,距离阈值为700 m时算法准确率提升幅度最高,相较SMoT算法增加了约35%,相较ST-DBSCAN算法增加了26%,而此时召回率相较SMoT算法仅降低约12%,相较ST-DBSCAN算法则增加了12%。尽管在高频采样间隔下,算法准确率在较高距离阈值设置下略逊于ST-DBSCAN算法,召回率也低于SMoT算法,但通过比较4.3节的参数设置结果可以发现,算法准确率低是由于表决阈值设置过于严格导致,可通过调整滑动窗口和表决阈值的大小进一步提升算法准确率。如5 min采样间隔下,将表决阈值设置为0.67时,识别准确率可由18%提升至50%,但召回率仅从98%降低至83%。由于隐私问题,当前研究和应用中使用的大规模手机定位数据集中的时间分辨率往往较低^[22],本研究提出的滑动窗口聚类算法具有较为广泛的应用场景。

4 结论及展望

为了从时空分辨率较低、数据噪声较为常见的手机定位数据中更加准确地提取个体停留区域,本研究提出了一种基于滑动窗口的增长聚类算法。该算法能够综合考虑观察点间的时空邻近程度进行个体停留区域的聚类判断,进而提高个体停留区域识别的准确性。

以深圳市手机定位数据作为实验数据,一系列的实验结果显示,本研究提出的停留区域识别算法具有较高的准确率和召回率。在低频采样间隔下（如30 min和60 min采样间隔）,尽管本算法的识别召回率相较常用的SMoT算法有10%左右的减少,但在准确率方面较常用的ST-DBSCAN算法和SMoT算法均有显著的提升,提升幅度最大可达到35%;而在高频采样间隔下（如5 min和15 min采样间隔）,通过调节滑动窗口的宽度和表决阈值,本算法也可达到不输于上述2种常用算法的识别准确率。由于隐私问题,城市区域的大规模手机定位数据集的时空分辨率往往较低,本研究提出的停留区域识别算法能够在这一类数据集上有较好的表现结果和更为广泛的应用场景,可增强基于手机用户停留区域的众多研究的可靠性。例如,利用手机定位数据辅助或补充交通出行调查、利用手机定位数据分析特定群体时空行为进而辅助公共政策合理制定、以及利用手机定位数据协助精准健康干预等。

由于不同城市的手机基站分布密度、数据采样时间间隔等特征均有所差异,为了实现较好的识别结果,应用研究中需根据城市实际情况对算法中的距离、滑动窗口等阈值进行相应调整。在未来研究中将进一步将本算法应用在其他城市手机定位数据集中,探索本算法的阈值调整规则。此外,本研究将基于个体停留区域识别结果进一步开展相关领域的研究,如人类移动规律研究、个体活动链模式、交通出行需求分析和定制化健康服务等。

The authors have declared that no competing interests exist.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	[中国工业和信息化部.2017年通信运营业统计公报[EB/OL] .2018-02-02 [ Ministry of Industry and Information Technology of the People’s Republic of China. Statistical bulletin of communications operations in 2017[EB/OL]. 2018-02-02.]

[2]	Zheng Y.Trajectory data mining: An overview[J]. ACM Transactions on Intelligent Systems and Technology, 2015,6(3):29.

[3]

Yue

, Lan

, Yeh

A G O

, et al. Zooming into individuals to understand the collective: A review of trajectory-based travel behaviour studies[J]. Travel Behaviour & Society, 2014,1(2):69-78.

Understanding travel behaviour is significant in travel demand management as well as in urban and transport planning. Over the past decade, with the advancement of data collection techniques, such as GPS, transit smart cards, and mobile phones, various types of travel trajectory data are increasingly complementing or replacing conventional travel diaries and stated preference data. Other location-aware data are used in studying human movement patterns, such as social network check-in data and banknote dispersal data. Abundance of the emerging trajectory data has driven a new wave of travel behaviour research, and introduced new research problems. This paper provides a state-of-the-art review of the travel behaviour studies categorised by trajectory data types. Based on the literature review, research challenges are discussed and promising research topics in this field are proposed for future studies.

DOI

[4]

刘瑜. 社会感知视角下的若干人文地理学基本问题再思考[J].地理学报,2016,71(4):564-575.

近年来,不同类型大数据在地理研究中得到了越来越多的重视,许多学者基于手机、社交媒体、出租车等数据开展了大量实证研究。社会感知概念刻画了地理空间大数据基于大量人的行为时空模式获取地理环境特征的的技术手段,该手段有助于重新审视地理学研究中的一些基本问题,因而本文选择了空间分布和空间交互这两个基本地理概念以及定性方法和定量方法这两个人文地理基本研究方法展开讨论。大数据从微观个体和宏观群体两个层面同时感知空间分布和空间交互,可以定量分析其中的距离以及尺度效应。进而,由于小样本访谈人群和场所是定性研究的基础,而大数据可以通过定量方法识别特定人群和场所并进行刻画,因此,社会感知手段为集成定性和定量研究方法,构建混合地理学奠定了基础。

DOI

[ Liu

.Revisiting several basic geographical concepts: A social sensing perspective[J]. Acta Geographica Sinica, 2016,71(4):564-575. ]

[5]	郑宇. 城市计算概述[J].武汉大学学报·信息科学版,2015,40(1):1-13. [ Zheng Y.Introduction to urban computing[J]. Geomatics and Information Science of Wuhan University, 2015,40(1):1-13. ]

[6]

陆锋,刘康,陈洁.大数据时代的人类移动性研究[J].地球信息科学学报,2014,16(5):665-672.

lt;p>人类个体/群体移动特征是多学科共同关注的研究主题。移动定位、无线通讯和移动互联网技术的快速发展使得获取大规模、长时间序列、精细时空粒度的个体移动轨迹和相互作用定量化成为可能。同时，地理信息科学、统计物理学、复杂网络科学和计算机科学等多学科交叉也为人类移动性研究的定量化提供了有力支撑。本文首先系统总结了大数据时代开展人类移动性研究的多源异构数据基础和多学科研究方法，然后将人类移动性研究归纳为面向人和面向地理空间两大方向。面向人的研究侧重探索人类移动特性的统计规律，并建立模型解释相应的动力学机制，或分析人类活动模式，并预测出行或活动；面向地理空间的研究侧重从地理视角分析人类群体在地理空间中的移动，探索宏观活动和地理空间的交互特征。围绕这两大方向，本文评述了人类移动性的研究进展和存在问题，认为人类移动性研究在数据稀疏性、数据偏斜影响与处理、多源异构数据挖掘、机器学习方法等方面依然面临挑战，对多学科研究方法的交叉与融合提出了更高要求。</p>

DOI

[ Lu

, Liu

, Chen

.Research on human mobility in big data era[J]. Journal of Geo-information Science, 2014,16(5):665-672. ]

[7]	Schneider C M, Belik V, Couronné T, et al.Unravelling daily human mobility motifs[J]. Journal of the Royal Society Interface, 2013,10(84):20130246. DOI

[8]	Phithakkitnukoon S, Horanont T, Lorenzo G D, et al.Activity-aware map: Identifying human daily activity pattern using mobile phone data[C]. Human Behavior Understanding, First International Workshop, HBU 2010, Istanbul, Turkey, August 22, 2010. Proceedings. DBLP, 2010:14-25.

[9]

Pei

, Sobolevsky

, Ratti

, et al.A new insight into land use classification based on aggregated mobile phone data[J]. International Journal of Geographical Information Science, 2014,28(9):1988-2007.

Land-use classification is essential for urban planning. Urban land-use types can be differentiated either by their physical characteristics (such as reflectivity and texture) or social functions. Remote sensing techniques have been recognized as a vital method for urban land-use classification because of their ability to capture the physical characteristics of land use. Although significant progress has been achieved in remote sensing methods designed for urban land-use classification, most techniques focus on physical characteristics, whereas knowledge of social functions is not adequately used. Owing to the wide usage of mobile phones, the activities of residents, which can be retrieved from the mobile phone data, can be determined in order to indicate the social function of land use. This could bring about the opportunity to derive land-use information from mobile phone data. To verify the application of this new data source to urban land-use classification, we first construct a vector of aggregated mobile phone data to characterize land-use types. This vector is composed of two aspects: the normalized hourly call volume and the total call volume. A semi-supervised fuzzy c-means clustering approach is then applied to infer the land-use types. The method is validated using mobile phone data collected in Singapore. Land use is determined with a detection rate of 58.03%. An analysis of the land-use classification results shows that the detection rate decreases as the heterogeneity of land use increases, and increases as the density of cell phone towers increases.

DOI

[10]	尹凌,姜仁荣,赵志远,等.利用手机通话位置数据估计城市24h人口分布误差[J].地球信息科学学报,2017,19(6):763-771. [ Yin L, Jiang R R, Zhao Z Y, et al.Exploring the bias of estimating 24-hour population distributions using call detail records[J]. Journal of Geo-information Science, 2017,19(6):763-771. ]

[11]

Calabrese

, Lorenzo G

, Liu

, et al.Estimating origin-destination flows using mobile phone location data[J]. IEEE Pervasive Computing, 2011,10(4):36-44.

Using an algorithm to analyze opportunistically collected mobile phone location data, the authors estimate weekday and weekend travel patterns of a large metropolitan area with high accuracy.

DOI

[12]

Fang Z

, Yang X

, Xu

, et al.Spatiotemporal model for assessing the stability of urban human convergence and divergence patterns[J]. International Journal of Geographical Information Science, 2017,31(11):2119-2141.

Abstract Understanding the stability of urban flows is critical for urban transportation, urban planning and public health. However, few studies have measured the stability of aggregate human convergence or divergence patterns. We propose a spatiotemporal model for assessing the stability of human convergence and divergence patterns. A mobile phone location data set obtained from Shenzhen, China, was used to assess the stability of daily human convergence and divergence patterns at three different spatial scales, i.e. points (cell phone towers), lines (bus lines) and areas (traffic analysis zones [TAZs]). Our analysis results demonstrated that the proposed model can identify points and bus lines with time-dependent variations in stability, which is useful for delineating TAZs for transportation planning, or adjusting bus timetables and routes to meet the needs of bus riders. Comparisons of the results obtained from the proposed model and the widely used entropy measure indicated that the proposed model is suitable for assessing the differences in stability for various types of spatial analysis units, e.g. cell phone towers. Therefore, the proposed model is a useful alternative approach of measuring spatiotemporal stability of aggregate human convergence and divergence patterns, which can be derived from the space鈥搕ime trajectories of moving objects.

DOI

[13]

Brdar

, Gavrić

, Ćulibrk

, et al.Unveiling spatial epidemiology of HIV with mobile phone data[J]. Scientific reports, 2016,6:19342.

An increasing amount of geo-referenced mobile phone data enables the identification of behavioral patterns, habits and movements of people. With this data, we can extract the knowledge potentially useful for many applications including the one tackled in this study - understanding spatial variation of epidemics. We explored the datasets collected by a cell phone service provider and linked them to spatial HIV prevalence rates estimated from publicly available surveys. For that purpose, 224 features were extracted from mobility and connectivity traces and related to the level of HIV epidemic in 50 Ivory Coast departments. By means of regression models, we evaluated predictive ability of extracted features. Several models predicted HIV prevalence that are highly correlated (>0.7) with actual values. Through contribution analysis we identified key elements that correlate with the rate of infections and could serve as a proxy for epidemic monitoring. Our findings indicate that night connectivity and activity, spatial area covered by users and overall migrations are strongly linked to HIV. By visualizing the communication and mobility flows, we strived to explain the spatial structure of epidemics. We discovered that strong ties and hubs in communication and mobility align with HIV hot spots.

DOI PMID

[14]

Isdory

, Mureithi E

, Sumpter D

.The impact of human mobility on HIV transmission in kenya[J]. Plos One, 2015,10(11):e0142805.

Disease spreads as a result of people moving and coming in contact with each other. Thus the mobility patterns of individuals are crucial in understanding disease dynamics. Here we study the impact of human mobility on HIV transmission in different parts of Kenya. We build an SIR metapopulation model that incorporates the different regions within the country. We parameterise the model using census data, HIV data and mobile phone data adopted to track human mobility. We found that movement between different regions appears to have a relatively small overall effect on the total increase in HIV cases in Kenya. However, the most important consequence of movement patterns was transmission of the disease from high infection to low prevalence areas. Mobility slightly increases HIV incidence rates in regions with initially low HIV prevalences and slightly decreases incidences in regions with initially high HIV prevalence. We discuss how regional HIV models could be used in public-health planning. This paper is a first attempt to model spread of HIV using mobile phone data, and we also discuss limitations to the approach.

DOI PMID

[15]

Mao

, Yin

, Song X

, et al.Mapping intra-urban transmission risk of dengue fever with big hourly cellphone data[J]. Acta Tropica, 2016,162:188-195.

Cellphone tracking has been recently integrated into risk assessment of disease transmission, because travel behavior of disease carriers can be depicted in unprecedented details. Still in its infancy, such an integration has been limited to: 1) risk assessment only at national and provincial scales, where intra-urban human movements are neglected, and 2) using irregularly logged cellphone data that miss numerous user movements. Furthermore, few risk assessments have considered positional uncertainty of cellphone data. This study proposed a new framework for mapping intra-urban disease risk with regularly logged cellphone tracking data, taking the dengue fever in Shenzhen city as an example. Hourly tracking records of 5.85 million cellphone users, combined with the random forest classification and mosquito activities, were utilized to estimate the local transmission risk of dengue fever and the importation risk through travels. Stochastic simulations were further employed to quantify the uncertainty of risk. The resultant maps suggest targeted interventions to maximally reduce dengue cases exported to other places, as well as appropriate interventions to contain risk in places that import them. Given the popularity of cellphone use in urbanized areas, this framework can be adopted by other cities to design spatio-temporally resolved programs for disease control.

DOI PMID

[16]

Spaccapietra

, Parent

, Damiani M

, et al.A conceptual view on trajectories[J]. Data & Knowledge Engineering, 2008,65(1):126-146.

Analysis of trajectory data is the key to a growing number of applications aiming at global understanding and management of complex phenomena that involve moving objects (e.g. worldwide courier distribution, city traffic management, bird migration monitoring). Current DBMS support for such data is limited to the ability to store and query raw movement (i.e. the spatio-temporal position of an object). This paper explores how conceptual modeling could provide applications with direct support of trajectories (i.e. movement data that is structured into countable semantic units) as a first class concept. A specific concern is to allow enriching trajectories with semantic annotations allowing users to attach semantic data to specific parts of the trajectory. Building on a preliminary requirement analysis and an application example, the paper proposes two modeling approaches, one based on a design pattern, the other based on dedicated data types, and illustrates their differences in terms of implementation in an extended-relational context.

DOI

[17]	Zheng Y, Chen Y K, Xie X, et al.GeoLife2.0: A Location-Based Social Networking Service[C]. Tenth International Conference on Mobile Data Management: Systems, Services and Middleware. IEEE, 2009:357-358.

[18]

Zheng

, Xie

, Ma W

.GeoLife: A collaborative social networking service among user, location and trajectory[J]. Bulletin of the Technical Committee on Data Engineering, 2011,33(2):32-39.

People travel in the real world and leave their location history in a form of trajectories. These trajectories do not only connect locations in the physical world but also bridge the gap between people and locations. This paper introduces a social networking service, called GeoLife, which aims to understand trajectories, locations and users, and mine the correlation between users and locations in terms of user-generated GPS trajectories. GeoLife offers three key applications scenarios: 1) sharing life experiences based on GPS trajectories; 2) generic travel recommendations, e.g., the top interesting locations, travel sequences among locations and travel experts in a given region; and 3) personalized friend and location recommendation.

[19]

Bao

, Zheng

, Wilkie

, et al.Recommendations in location-based social networks: A survey[J]. Geoinformatica, 2015,19(3):525-565.

Recent advances in localization techniques have fundamentally enhanced social networking services, allowing users to share their locations and location-related contents, such as geo-tagged photos and notes. We refer to these social networks as location-based social networks (LBSNs). Location data bridges the gap between the physical and digital worlds and enables a deeper understanding of users preferences and behavior. This addition of vast geo-spatial datasets has stimulated research into novel recommender systems that seek to facilitate users travels and social interactions. In this paper, we offer a systematic review of this research, summarizing the contributions of individual efforts and exploring their relations. We discuss the new properties and challenges that location brings to recommender systems for LBSNs. We present a comprehensive survey analyzing 1) the data source used, 2) the methodology employed to generate a recommendation, and 3) the objective of the recommendation. We propose three taxonomies that partition the recommender systems according to the properties listed above. First, we categorize the recommender systems by the objective of the recommendation, which can include locations, users, activities, or social media. Second, we categorize the recommender systems by the methodologies employed, including content-based, link analysis-based, and collaborative filtering-based methodologies. Third, we categorize the systems by the data sources used, including user profiles, user online histories, and user location histories. For each category, we summarize the goals and contributions of each system and highlight the representative research effort. Further, we provide comparative analysis of the recommender systems within each category. Finally, we discuss the available data-sets and the popular methods used to evaluate the performance of recommender systems. Finally, we point out promising research topics for future work. This article presents a panorama of the recommender systems in location-based social networks with a balanced depth, facilitating research into this important research theme.

DOI

[20]	Lian D F, Xie X.Mining check-in history for personalized location naming[J]. Acm Transactions on Intelligent Systems & Technology, 2014,5(2):1-25.

[21]	Ahas R, Laineste J, Aasa A, et al.The spatial accuracy of mobile positioning: Some experiences with geographical studies in Estonia[M]. Location based services and telecartography. Springer Berlin Heidelberg, 2007:445-460.

[22]

Ahas

, Aasa

, Silm

, et al.Mobile positioning in space-time behaviour studies: social positioning method experiments in estonia[J]. American Cartographer, 2007,34(4):259-273.

The paper introduces methods and applications of the mobile positioning-based social positioning method in geography. The social positioning method (SPM) studies space time behavior by analyzing the location coordinates of mobile phones and the social characteristics of the people carrying them. We describe the experience gained from the SPM pilot studies carried out in Estonia from 2003 to 2006. The results demonstrate that mobile positioning-based tracing is applicable in different geographical studies, as an analysis of temporal movement patterns and activity spaces. The biggest advantage of mobile positioning-based methods is that mobile phones are widespread, positioning works inside buildings, and collection of movement data is done by a third party at regular intervals. The disadvantage of mobile positioning today is relatively low spatial accuracy and surveillance fears. The boom in the generation of phones with A-GPS will improve positioning accuracy in networks.

DOI

[23]

Vajakas

, Vajakas

, Lillemets

.Trajectory reconstruction from mobile positioning data using cell-to-cell travel time information[J]. International Journal of Geographical Information Science, 2015,29(11):1941-1954.

This paper proposes a technique for improving the accuracy of mobile device movement trajectory reconstruction using passive mobile positioning data. The major sources of uncertainty in trajectory reconstruction are imprecise cell shape data and ‘ping-pong’ effects caused by cell handovers. We used a novel technique for improved ‘ping-pong’ effect suppression by compensating for some cell shape distortions based on temporal cell-to-cell transit statistics. The results were evaluated by estimating traffic flow using trajectory reconstruction. The proposed technique improved the accuracy of results compared to ‘ping-pong’ suppression algorithms found in the literature.

DOI

[24]	Iovan C, Olteanu-Raimond A M, Couronné T, et al. Moving and calling: Mobile phone data quality measurements and spatiotemporal uncertainty in human mobility studies[M]. Geographic Information Science at the Heart of Europe. Springer, Cham, 2013:247-265.

[25]	Ester M, Kriegel H P, Xu X.A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise[C]. International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996:226-231.

[26]

Birant

, Kut

.ST-DBSCAN: An algorithm for clustering spatial-temporal data[J]. Data & Knowledge Engineering, 2007,60(1):208-221.

This paper presents a new density-based clustering algorithm, ST-DBSCAN, which is based on DBSCAN. We propose three marginal extensions to DBSCAN related with the identification of (i) core objects, (ii) noise objects, and (iii) adjacent clusters. In contrast to the existing density-based clustering algorithms, our algorithm has the ability of discovering clusters according to non-spatial, spatial and temporal values of the objects. In this paper, we also present a spatial–temporal data warehouse system designed for storing and clustering a wide range of spatial–temporal data. We show an implementation of our algorithm by using this data warehouse and present the data mining results.

DOI

[27]	Palma A T, Bogorny V, Kuijpers B, et al.A clustering-based approach for discovering interesting places in trajectories[C]. ACM Symposium on Applied Computing. DBLP, 2008:863-868.

[28]	曹劲舟,涂伟,李清泉,等.基于大规模手机定位数据的群体活动时空特征分析[J].地球信息科学学报,2017,19(4):467-474. [ Cao J Z, Tu W, Li Q Q, et al.Spatio-temporal analysis of aggregated human activities based on massive mobile phone tracking data[J]. Journal of Geo-information Science, 2017,19(4):467-474. ]

[29]	Alvares L O, Bogorny V, Kuijpers B, et al.A model for enriching trajectories with semantic geographical information[C]. ACM International Symposium on Advances in Geographic Information Systems. ACM, 2007:22.

[30]	Horn C, Klampfl S, Cik M, et al.Into digitization: Some concepts and methods of Chinese historical geographic information system[J]. Historical Geography, 2002(2405):49-56.

[31]	Kang J H.Extracting places from traces of locations[J]. Acm Sigmobile Mobile Computing & Communications Review, 2005,9(3):58-68.

[32]

Widhalm

, Yang

, Ulm

, et al.Discovering urban activity patterns in cell phone data[J]. Transportation, 2015,42(4):597-623.

Massive and passive data such as cell phone traces provide samples of the whereabouts and movements of individuals. These are a potential source of information for models of daily activities in a city. The main challenge is that phone traces have low spatial precision and are sparsely sampled in time, which requires a precise set of techniques for mining hidden valuable information they contain. Here we propose a method to reveal activity patterns that emerge from cell phone data by analyzing relational signatures of activity time, duration, and land use. First, we present a method of how to detect stays and extract a robust set of geolocated time stamps that represent trip chains. Second, we show how to cluster activities by combining the detected trip chains with land use data. This is accomplished by modeling the dependencies between activity type, trip scheduling, and land use types via a Relational Markov Network. We apply the method to two different kinds of mobile phone datasets from the metropolitan areas of Vienna, Austria and Boston, USA. The former data includes information from mobility management signals, while the latter are usual Call Detail Records. The resulting trip sequence patterns and activity scheduling from both datasets agree well with their respective city surveys, and we show that the inferred activity clusters are stable across different days and both cities. This method to infer activity patterns from cell phone data allows us to use these as a novel and cheaper data source for activity-based modeling and travel behavior studies.

DOI

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

1 引言

2 基于滑动窗口的停留区域识别算法

2.1 基本定义

Fig. 1 The sketch of detecting individual stay areas from raw mobile phone location data

2.2 聚类规则

2.3 整体算法流程

Fig. 2 The flow chart of detecting individual stay areas from raw mobile phone location data

3 案例分析

3.1 数据源介绍

Tab. 1 Format of the mobile phone location dataset

3.2 算法验证体系

Fig. 3 Manual annotation of individual stay areas based on mobile phone location data

3.3 参数设置

Fig. 4 The accuracy changes of TW-cluster algorithm with different δ and ξ

Fig. 5 The recall changes of TW-cluster algorithm with different δ and ξ

3.4 算法比较

Fig. 6 The accuracy of different clustering algorithms with different sampling intervals

Fig. 7 The recall of different clustering algorithms with different sampling intervals

4 结论及展望

References