地球信息科学理论与方法

手机用户上网时段的混合Markov预测方法

  • 方志祥 , 1, 2, * ,
  • 于冲 1 ,
  • 张韬 3 ,
  • 冯明翔 1 ,
  • 倪雅倩 1
展开
  • 1. 武汉大学 测绘遥感信息工程国家重点实验室,武汉 430079
  • 2. 地球空间信息技术协同创新中心,武汉 430079
  • 3. 中国移动通信集团湖北有限公司业务支撑中心,武汉 518055

作者简介:方志祥(1977-),男,教授,主要从事时空行为建模、导航与位置服务研究。E-mail:

收稿日期: 2017-03-23

  要求修回日期: 2017-06-23

  网络出版日期: 2017-08-20

基金资助

国家自然科学基金项目(41231171、41371420)

湖北省青年英才开发计划

武汉大学自主科研项目拔尖创新人才类资助项目(2042015KF0167)

A mixed Markov Method to Predict the Surfing Time Period of Mobile Phone Users

  • FANG Zhixiang , 1, 2, * ,
  • YU Chong 1 ,
  • ZHANG Tao 3 ,
  • FENG Mingxiang 1 ,
  • NI Yaqian 1
Expand
  • 1. State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
  • 2. Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China
  • 3. Business Support Center, Hubei Mobile, Wuhan 430040, China
*Corresponding author: FANG Zhixiang, E-mail:

Received date: 2017-03-23

  Request revised date: 2017-06-23

  Online published: 2017-08-20

Copyright

《地球信息科学学报》编辑部 所有

摘要

手机用户上网时段研究与预测对手机用户行为与模式分析、网络服务内容设计、网络黏性与心理、移动互联商业智能等具有重要意义。本文结合Markov模型和关联规则模型,提出一种手机用户上网时段的混合Markov预测方法——Lift-Markov(LM)方法,并采用中国某城市4G手机用户流量上网产生的流量收费数据进行实验验证与分析。研究发现:该实验区域37.66%的手机用户个体存在明显的以天为周期的周期性特性;本文所提出的LM方法在10、20、30、40、50、60 min间隔时的平均预测准确率都优于Markov模型和Mostvalue模型,其中在60 min间隔时能达到79.75%的平均准确率,优于Markov模型(74.64%)和Mostvalue模型(64.44%);LM方法的预测准确率分布相比于其他2种模型都要窄,而且密度分布峰值最高、标准差最小,说明本文方法对人群的上网时段预测准确率较为集中与稳定,具有较好的预测性能。

本文引用格式

方志祥 , 于冲 , 张韬 , 冯明翔 , 倪雅倩 . 手机用户上网时段的混合Markov预测方法[J]. 地球信息科学学报, 2017 , 19(8) : 1019 -1025 . DOI: 10.3724/SP.J.1047.2017.01019

Abstract

In recent years, big data of mobile phones has become a great data source for researches and applications. It has been widely used to understand the human behaviors in cyberspace space. Researching and forecasting the surfing time of mobile phone users have great significance for analyzing mobile phone users’ behaviors and patterns, designing network service, and understanding the relationship of surfing behaviors, website stickiness, users’ psychology, mobile Internet intelligent business. We proposed a mixed Markov method (Lift-Markov method. LM), combining the traditional Markov model and association rule model, to predict the surfing time period of mobile phone users. A dataset of surfing records of 4G mobile phone users collected by Hubei Mobile within twenty days is used to demonstrate the capability of predicting web-surfing time periods of users. LM method has a better prediction accuracy when it is compared with the traditional Markov model and the Most-value model. There are two main findings here: the first one is that there is obvious periodicity in surfing time periods of 37.66% mobile phone users in experimental area by Fourier transformation and periodic tests, which could help us understand the surfing characteristics of users. Also, the second one is that the average accuracy of our proposed method is better than the Markov model and the Most-value model in 10 minutes, 20 minutes, 30 minutes, 40 minutes, 50 minutes and 60 minutes intervals. LM method can perform an average accuracy of 79.75% in predicting web-surfing time on a scale of 60 minutes, better than the Markov model (74.64%) and the Most-value model (64.44%). Compared with the other two models, the accuracy distribution of the LM method is narrower, the peak value is higher, and the standard deviation is smaller, which means that the prediction accuracy of the LM method is more concentrated and stable, with good predictive performance.

1 引言

手机位置大数据已成为人类移动模式识别与预测[1-3]、城市与社会感知[4-6]、交通与城市规划[7]、商业智能[8]、时空行为与传染病防控[9]等研究领域的重要数据源,在空间数据与时空行为挖掘、商业领域智能分析应用等方面都起着重要的作用。除了空间行为之外,网络空间行为也是人类行为的一个重要方面。例如,对手机上网行为分析与预测,可为人群上网行为模式与规律分析[10]、手机上网内容偏好与APP程序使用意愿挖掘[11]、网站黏性与网购能力分析[12]、衍生的上网心理影响评估[13]、社会舆情探测与分析[14]、移动互联商业模式挖掘与设计[8,12]等提供重要参考价值,已受到地理信息科学、人文地理、计算机科学、社会公共安全等研究领域的重点关注。
手机位置大数据为手机用户上网行为的分析与预测提供了良好的机遇[15-17],如其数据采样的群体较大、现实与网络空间信息覆盖较全,但对用户个体的属性与行为特性信息仍较为缺乏。目前的手机上网行为研究基本可以包括上网时长、偏好模式、行为预测等方面。在手机上网时长方面,常楠等[18]从不同手机型号角度,分析手机上网时长的分布规律,并用核密度分布与高斯混合模型等理论对手机上网时长进行建模和分析,发现其具有双峰现象。在偏好模式方面,一些研究把上网时间行为偏好总结为4种模式,如午夜、工作时间、傍晚、晚间休息[19],且用户上网偏好存在每天上网次数10~200次,流量1~1000 M的显著区别[20]。在行为预测方面,Hong Cao等认为时间和位置是利用手机数据分析与预测人们在网络空间行为(如APP的使用)的重要特征[21],能帮助人们更好地理解用户的偏好模式,更准确地预测用户与手机的交互行为等;Halvey M等利用时间信息建立时序Markov模型,对移动终端点击行为进行分析与预测,达到60%的准确率[22]。由此可知,目前的研究涉及到面向手机用户上网时段的预测研究相对较少,而这是手机用户上网行为挖掘与应用的基础。本文针对面向手机用户上网行为分析与预测这一研究与应用需求,开展上网时段的预测方法研究,依据历史时段与相邻时段的概率关联规则,建立时段预测的概率提升(Lift)策略,将其与Markov预测理论相结合形成一种混合Markov预测方法,并结合中国某城市的手机上网数据进行实验验证与分析,评价其预测的准确率。

2 手机用户上网时段特征分析

2.1 手机上网活动的时段差异

本文采用中国某城市20 d的手机上网记录数据(2015年8月10日至2015年8月29日)开展研究,该数据集包含了12.3万人的手机4G流量收费数据,记录了用户的一些上网信息,包括该上网的ID、时间、移动通信基站、上网APP或页面URL、发生的流量等,该数据经过脱敏处理。本文只针对不同时段手机用户是否上网进行判断与预测,不涉及隐私信息部分的内容。
以1 h间隔将每天从0时开始划分为24个时间段,根据数据判断手机用户上网记录所发生的时段并做归整,从而统计得到每天每时段具有上网记录的人数时序图。由图1(a)可以发现:① 同一天不同时段手机用户群体的上网人数存在较大的差异,波峰与波谷的上网用户人数相差大约8000人;② 手机用户群体的上网时段存在明显以天的周期性特征,偶有异常(如8月13日);③ 每天的整体波动趋势几乎一样,都表现出双波峰的形态特征,即在12时和18时出现上网人数高峰。
为了探究不同时段的上网活动活跃差异,利用Ward最小方差法[23]对20天各时段的人数进行层次聚类,其结果如图1(b)所示,将上网人数分为20个等级,其中越蓝代表上网人数越少,越红表示人数越多。根据聚类结果,可以将24个时刻分为3个类别:① 1-6时为低频活动期,该时间用户大多处于晚上休息的状态;② 9-21时为高频活动期,该时段手机上网用户较多,大多处于工作或活动的状态;③ 7-8时和22-24时为过渡时期,如7-8时段用户群体从低频活动期过渡到高频活动期,22-24时由高频活动期过渡到低频活动期。本文为这3个时段分别建立上网时段状态转移矩阵,以提高预测准确率。
Fig. 1 The temporal features of surfing by smart phone users

图1 用户手机上网时段分布特征

2.2 用户个体的上网时段周期性检验

图1(a)呈现出了群体的周期性规律,检验个体用户是否也存在上网时段的周期性。将单个手机用户每个时段的上网状态(0和1)构建成一个长度为n的序列,这里的状态0表示在该时段内没有上网,状态1为存在上网行为,n是数据集的时段个数。本文把用户上网状态的时间序列数据看作离散信号,利用傅里叶变换原理[24]来分析用户上网状态序列的周期性,将有限的上网状态序列 x 1 , x 2 , x 3 , , x n 表示为正交三角函数组的线性组合,即:
x t = k = 0 n 2 [ a k cos 2 πkt n + b k sin 2 πkt n ] , t = 1,2 , n (1)
其中,
a k = 1 n t = 1 n x t cos 2 πkt n , k = 0 , k = n 2 2 n t = 1 n x t cos 2 πkt n , k = 1,2 , 3 , , n - 1 2 (2)
b k = 2 n t = 1 n x t sin 2 πkt n , k = 1,2 , , n - 1 2 (3)
ω k = 2 πkt n (4)
式中:akbk为傅里叶系数; ω k 为傅里叶频率。根据式(5)生成用户上网状态的周期图[25],如果原序列是具有周期的,则在某些周期频率则会出现较大峰值。
I ω k = n a 0 2 , k = 0 n 2 a k 2 + b k 2 , k = 1,2 , , n - 1 2 n a n 2 2 , 当为偶数的 , k = n 2 (5)
结合式(6)的Fisher统计量[26]对周期图峰值进行检验:
g = Max I ω k k = 1 n / 2 ω k (6)
根据该周期性检验的一般方法,如果某个频率上的g值大于显著性检验Fisher检验表中显著参数为0.05的周期分量g0.05,则认为原序列数据存在 T = n k 的周期,否则认为其在频率 k n 处不存在周期分量。
对数据集中12.3万人的手机上网状态进行周期性检验,其结果表明:手机用户群体具有以天为周期的周期性规律,其中53 367人通过检验具有周期性,占到总人数的43.23%,46 494人存在以天为周期,占总人数的37.66%,在通过周期性检验人数中占87.12%,并且具有以天为周期的这些用户产生了85.68%上网记录。

3 手机上网时段的混合Markov预测方法

3.1 基本思想

手机上网时段的预测思路是从横纵2个维度来进行概率整合,如图2所示。横向代表用户一天中24个时刻的上网状态,纵向代表每天该时段的上网状态,其中黑色代表该时段存在上网状态,白色表示没有上网状态。本文假定横向的上网状态变化过程是Markov过程,根据前一个时段的上网状态,利用Markov模型来对下一个时段的上网状态进行预测;纵向则通过计算相邻两天上网状态的关联关系,得到前一天到当前的概率提升度,从而提出Lift-Markov混合方法,来预测下一时段的上网状态。
Fig. 2 The image of users′ surfing status

图2 用户上网状态示意图

3.2 Lift-Markov混合预测方法

根据手机上网活动时段差异分析时得到用户在低频、高频和过渡时期,利用历史数据构建这3个时段的上网状态转移概率矩阵;通过关联规则,挖掘手机用户相邻两天上网状态的关联关系,在已知前一天状态下来计算当前状态提高的概率(Lift),然后把二者的概率结合起来得到预测后的上网状态,称之为Lift-Markov(LM)混合预测方法。具体计算步骤如下:
(1)构建用户Ui的上网状态序列。按照一定的时间间隔(如10、20、……,60 min等),生成用户每天的上网状态序列,共得到n天的序列,用 E 来表示这些序列的集合。
(2)计算低频、高频和过渡等时期内t时段到t+1时段用户Ui的3个状态转移概率矩阵Pt,并根据t时段上网状态,确定状态分布St。假设用户的状态集为S={i,j…},用户上网状态变化满足式(4),其中 x t S    t = 1,2 , 3 ,通过式(8)和式(7)计算t时段到t+1时段的状态转移矩阵中的概率。
P X t + 1 = x t + 1 | X t = x t X 1 = x 1 = P X t + 1 = x t + 1 | X t = x t (7)
P t = P ij ,( 0 P ij 1 (8)
其中, P ij = P X t + 1 = i | X t = j i , j S 根据贝叶斯条件概率公式得到:
P ij = P i j P j (9)
式中:Pj是集合Et时段所处时期内用户处于状态j的概率; P i j 表示集合E中所处时期内用户t时段所于状态jt+1时段处于状态i的概率。
(3)基于关联规则挖掘相邻两天t+1时段状态的关系,根据式(10)、(11)计算关联规则的支持度,置信度和提升度:
C X t + 1 r = j X t + 1 r + 1 = i = Sup X t + 1 r = j , X t + 1 r + 1 = i Sup X t + 1 r = j (10)
L X t + 1 r = j X t + 1 r + 1 = i = C X t + 1 r = j X t + 1 r + 1 = i Sup X t + 1 r + 1 = i (11)
式中: r 1,2 , 3 , , n - 1 , i , j S ; Sup X t + 1 r = j 表示 第rt+1时段处于状态j的概率; Sup X t + 1 r + 1 = i 表示第r+1天t+1时段处于状态i的概率; Sup X t + 1 r = j , X t + 1 r + 1 = i 表示第rt+1时段处于状态j且第r+1天t+1时段处于状态i的概率; C X t + 1 r = j X t + 1 r + 1 = i 表示由第rt+1时段处于状态j到第r+1天t+1时段处于状态i的置信度; L X t + 1 r = j X t + 1 r + 1 = i 表示由第rt+1时段处于状态j到第r+1天t+1时段处于状态i相对于一般情况下的概率。
(4)计算t+1时段的状态分布St+1,并把St+1状态分布中概率最大的状态作为为t+1时段的预测状态,具体的计算公式如下:
S t + 1 = S t P t L X t + 1 r = j X t + 1 r + 1 = i , Sup X t + 1 r = j Δs C X t + 1 r = j X t + 1 r + 1 = i Δc S t P t , Sup X t + 1 r = j < Δs C X t + 1 r = j X t + 1 r + 1 = i < Δc (12)
式中: Δs Δc 分别指支持度和置信度的阈值。

4 实验与分析

4.1 实验数据集选取

从12.3万人手机4 G流量收费数据中,针对具有以天为周期的46 494个手机用户的记录,根据上网状态进行过滤,选取出每天都有手机上网记录的2821个手机用户数据作为实验数据集,并分别以不同的时间间隔(10、20、30、40、50、60 min),得到由状态0和1组成的用户上网状态序列。将前10 d的数据划分为训练集,后10 d的数据划分为测试集。每测试一天将该天的测试数据加入到训练数据中,继续进行下一天的测试实验。
本文采用准确率作为预测方法评估指标,具体定义为:
P A i = C i PR C i P (13)
式中:Ci(PR)为方法对用户Ui预测准确的次数;Ci(P)为方法对用户Ui进行预测的次数。

4.2 参数选取

在LM混合预测方法中设置的最小置信度 Δc 和最小支持度 Δs 阈值直接关系到方法的预测效果,如果阈值太低,可能会使用不可靠的规则,并且如果阈值太高,则导致使用的规则较少,使预测的准确率降低。本文以预测准确率最优为目标,把支持度和置信度都从0.3到0.7的范围内以步长0.1进行参数组合的探索分析,如表1所示。其探索结果是:当置信度和支持度的阈值分别设置为0.4和0.5时,其准确率最大79.71%。因此,本文选取 Δc = 0.4, Δs = 0.5 作为方法参数。
Tab. 1 Accuracy of LM method in different threshold values

表1 LM方法在不同阈值下的准确率

最小支持度阈值Δs 最小置信度阈值Δc
0.3 0.4 0.5 0.6 0.7
0.3
0.4
0.5
0.6
79.61
79.67
79.69
79.70
79.62
79.67
79.71
79.70
79.63
79.67
79.70
79.70
79.60
79.64
79.68
79.68
79.59
79.63
79.66
79.67

4.3 不同方法对比结果

本文将LM方法同用户状态预测的2种常用方法Markov模型和最频繁状态(Mostvalue)进行对比,其中Markov模型是在行为预测等方面应用较广的[3,21-22],且具有较好预测效果的预测方法;Mostvalue模型(MostFrequent)将下一时刻出现最频繁的行为作为预测值,经常将其视为预测模型中的基准[3],用来对比模型预测能力。3种方法都采取同样的测试和预测实验策略,即每测试一天将该天的测试数据加入到训练数据中,继续进行下一天的测试实验。以60 min为时间间隔,训练并预测,得到如图3所示的3个方法预测准确率。由图3可知,LM方法取得的预测准确率最高,平均准确率达到79.75%。随着预测天数的增加,训练数据也随之增加,此时的LM方法预测准确率从最低78.16%提升到最高82.67%。Markov模型和Mostvalue模型的平均预测准确率分别为74.64%和64.44%,但是随着训练数据集的增大Markov模型的准确率提高了7.22%,说明这种模型对训练集的要求较高;Mostvalue模型的准确率在59.25%到68.18%区间来回波动,其预测准确率稳定性较差。
Fig. 3 Comparison results of three methods for 10 testing days

图3 3种方法在10 天内的准确率对比结果

图4给出了LM方法、Markov模型和Mostvalue模型的不同预测准确率密度值的分布对比结果,相应的预测准确率标准差分别为:5.3、6.2、10.0,三者的预测准确率中位数分别是:80%、75%、64.58%,说明LM方法的预测水平优于其他2种模型。由图4中的分布曲线对比可知:LM方法预测准确率分布最窄,密度分布峰值最高,且标准差最小,说明LM方法相对其他2种模型对人群的上网时段预测准确率更集中与稳定。
Fig. 4 Accuracy distribution of three methods

图4 3种方法的预测准率分布

图5给出了在高频时期、低频时期和过渡时期所提出LM方法相对于Markov模型和Mostvalue模型预测准确率的差值分布。其对比结果为:在高频活动时期,LM方法相较其他2种模型分别平均提高了7.23%和18.72%,说明当用户上网状态复杂多变时,本文方法更能把握用户的上网规律,能有效地预测用户的上网时段;在低频活动时期,用户上网状态变化较少,3种方法对用户上网时段预测的准确率都较高,但本文方法的准确率相对于其他2种方法仍分别高出2.10%和7.87%。
Fig. 5 Increased accuracy of the proposed method in three kinds of time periods, such as high-frequency, low-frequency and transition periods.

图5 高频时期、低频时期和过渡时期时本文方法的准确率提升幅度

4.4 不同时间间隔时的预测准确率比较

图6给出了不同时间间隔(如10、20、30、40、50和60 min)时3种方法的预测结果对比结果。实验发现:① 当间隔分钟数为10 min时,3个方法的平均准确率都处于最大的情形,其中LM方法的平均准确率最高达到92.17%,优于Markov模型(89.31%)和Mostvalue模型(76.16%);② 当时间间隔增大,3种方法的预测准确率都呈现出降低的趋势。当时间间隔从10 min增加到60 min时,LM方法平均预测准确率降低了12.39%;Markov模型和Mostvalue模型的平均预测准确率分别降低了14.67%和11.72%;而本文方法的下降幅度处于中间水平,说明该方法具有较好的预测准确性能。
Fig. 6 Comparison results of average prediction correction in different intervals

图6 不同间隔时间时3种方法的平均预测准确率对比

5 结语

本文针对手机用户上网时段的预测进行了研究,分析了上网时段的周期性检验结果,提出一种Lift-Markov混合的上网时段预测方法。通过对实验数据的检验与预测分析,得出如下结论:① 37.66%的手机用户的上网时段具有以天为周期的规律;② 本文所提出的LM方法在10 min间隔尺度下能达到92.17%的平均准确率,在60 min的间隔尺度下能达到79.75%平均准确率,均优于Markov模型和Mostvalue模型,而且本文方法的预测准确率波动相对较小;③ 相比Markov模型和Mostvalue模型,LM方法在高频活动时期的平均预测准确率仍分别高出7.23%和18.72%,说明所提出的方法具有较好的预测性能。
手机用户上网的行为是一个复杂的过程,具有一定的复杂性(如同一时刻使用多种APP)和多变性(如同一时段使用APP的频繁切换),本文针对用户有无上网2种状态研究是深入理解手机用户上网行为的基础,但仍需要从上网频次、持续时长、地点、类型、模式等角度来进行深入理解,进行上网行为感知与预测。本研究旨在为人群上网行为模式与规律分析、手机上网内容偏好与APP程序使用意愿挖掘、基础设施动态配置、信息推荐与安全、社会舆情探测与分析、移动互联商业模式挖掘与设计等提供帮助。

The authors have declared that no competing interests exist.

[1]
Song C, Qu Z, Blumm N, et al.Limits of predictability in human mobility[J]. Science, 2010,327(5968):1018-1021.

DOI

[2]
Schneider C M, Belik V, Couronne T, et al.Unravelling daily human mobility motifs[J]. Journal of the Royal Society Interface, 2013,10(84):20130246.Human mobility is differentiated by time scales. While the mechanism for long time scales has been studied, the underlying mechanism on the daily scale is still unrevealed. Here, we uncover the mechanism responsible for the daily mobility patterns by analysing the temporal and spatial trajectories of thousands of persons as individual networks. Using the concept of motifs from network theory, we find only 17 unique networks are present in daily mobility and they follow simple rules. These networks, called here motifs, are sufficient to capture up to 90 per cent of the population in surveys and mobile phone datasets for different countries. Each individual exhibits a characteristic motif, which seems to be stable over several months. Consequently, daily human mobility can be reproduced by

DOI PMID

[3]
Do T M T, Gatica-Perez D. Where and what: Using smartphones to predict next locations and applications in daily life[J]. Pervasive & Mobile Computing, 2014,12(10):79-91.This paper investigates the prediction of two aspects of human behavior using smartphones as sensing devices. We present a framework for predicting where users will go and which app they will use in the next ten minutes by exploiting the rich contextual information from smartphone sensors. Our first goal is to understand which smartphone sensor data types are important for the two prediction tasks. Secondly, we aim at extracting generic (i.e., user-independent) behavioral patterns and study how generic behavior models can improve the predictive performance of personalized models. Experimental validation was conducted on the Lausanne Data Collection Campaign (LDCC) dataset, with longitudinal smartphone data collected over a period of 17 months from 71 users.

DOI

[4]
Yuan Y, Raubal M. Analyzing the distribution of human activity space from mobile phone usage: an individual and urban-oriented study[J]. International Journal of Geographical Information Science, 2016(8).Travel activities are embodied as people-檚 needs to be physically present at certain locations. The development of Information and Communication Technologies (ICTs, such as mobile phones) has introduced new data sources for modeling human activities. Based on the scattered spatiotemporal points provided in mobile phone datasets, it is feasible to study the patterns (e.g., the scale, shape, and regularity) of human activities. In this paper, we propose methods for analyzing the distribution of human activity space from both individual and urban perspectives based on mobile phone data. The Weibull distribution is utilized to model three predefined measurements of activity space (radius, shape index, and entropy). The correlation between demographic factors (age and gender) and the usage of urban space is also tested to reveal underlying patterns. The results of this research will enhance the understanding of human activities in different urban systems and demographic groups, as well as providing novel methods to expand the important and widely applicable area of geographic knowledge discovery in the age of instant access.

DOI

[5]
杨喜平,方志祥,赵志远,等.城市人群聚集消散时空模式探索分析——以深圳市为例[J]. 地球信息科学学报,2016,18(4):486-492.城市中人群的移动是带有目的性的,城市空间结构功能也存在差异,导致人群在城市中出现聚集或消散的现象,而且该现象会随着时间不断变化。本文基于海量的手机位置数据,以深圳市为例,采用自相关分析识别出城市中人群聚集与消散的区域,然后将这些区域一天中人群聚散组合成时间序列矩阵,采用自组织图聚类方法(SOM)进行聚类得到9种典型的人群聚集、消散时空模式,结合土地利用现状数据,分析解释了每种聚散模式最可能出现的土地利用组合。该研究从聚集和消散的角度探索了城市人群移动的时空模式,进一步帮助理解城市不同区域人群的移动模式以及与城市空间结构功能之间的关系,对城市规划、交通管理具有参考和指导意义。

DOI

[ Yang X P, Fang Z X, Zhao Z Y, et al.Exploring urban human spatio-temporal convergence-dispersion patterns: A case study of Shenzhen City[J]. Journal of Geo-Information Science, 2016,18(4):486-492. ]

[6]
徐金垒,方志祥,萧世伦,等.城市海量手机用户停留时空分异分析——以深圳市为例[J].地球信息科学学报,2015,17(2):197-205.识别海量手机数据中蕴含的行为模式,是地理学的一个研究热点与难点。目前,较多研究针对手机用户移动特征开展,而对停留及其模式的研究则相对较少;其时空分异规律对理解城市人群动态,甚至优化城市系统至关重要。本文根据人们日常时空约束条件定义了手机用户停留,提出了基于海量手机位置数据的手机用户停留模式的提取方法,以深圳市约790 万个匿名手机用户一天的海量手机位置数据为例,识别出了覆盖约98%用户的典型停留模式,并结合该城市土地利用的空间分布与分异特征,剖析不同停留模式的手机用户空间分异特征和城市不同区域停留次数的时段分异特征。研究发现:(1)15 种停留模式可覆盖约98%的手机用户,而且其一天不同的停留位置数量不超过4 个;(2)15 种停留模式手机用户在城市区域空间上的分布存在分异现象,严重受制于土地利用的空间分布;(3)城市不同区域停留次数的时段分异特征与该区域常住人口、人口密度,以及区域主要职能和性质存在较强的相关性。研究结论对理解城市手机用户行为模式的群体特征有积极的意义,对城市土地利用的科学决策和城市交通规划与预测有重要参考价值。

DOI

[ Xu J L, Fang Z X, Shaw S L, et al.The spatio-temporal heterogeneity analysis of massive urban mobile phone users'stay behavior: A case study of Shenzhen City[J]. Journal of Geo-Information Science, 2015,17(2):197-205. ]

[7]
Dong H, Wu M, Ding X, et al.Traffic zone division based on big data from mobile phone base stations[J]. Transportation Research Part C Emerging Technologies, 2015,58:278-291.Call detail record (CDR) data from mobile communication carriers offer an emerging and promising source of information for analysis of traffic problems. To date, research on insights and information to be gleaned from CDR data for transportation analysis has been slow, and there has been little progress on development of specific applications. This paper proposes the traffic semantic concept to extract traffic commuters- origins and destinations information from the mobile phone CDR data and then use the extracted data for traffic zone division. A K -means clustering method was used to classify a cell-area (the area covered by a base stations) and tag a certain land use category or traffic semantic attribute (such as working, residential, or urban road) based on four feature data (including real-time user volume, inflow, outflow, and incremental flow) extracted from the CDR data. By combining the geographic information of mobile phone base stations, the roadway network within Beijing-檚 Sixth Ring Road was divided into a total of 73 traffic zones using another K -means clustering algorithm. Additionally, we proposed a traffic zone attribute-index to measure tendency of traffic zones to be residential or working. The calculated attribute-index values of 73 traffic zones in Beijing were consistent with the actual traffic and land-use data. The case study demonstrates that effective traffic and travel data can be obtained from mobile phones as portable sensors and base stations as fixed sensors, providing an opportunity to improve the analysis of complex travel patterns and behaviors for travel demand modeling and transportation planning.

DOI

[8]
Ma Q, Zhang S, Zhou W, et al.When will you have a new mobile phone? An empirical answer from big data[J]. IEEE Access, 2017,4(99):10147-10157.When and why people change their mobile phones are important issues in mobile communications industry, because it will impact greatly on the marketing strategy and revenue estimation for both mobile operators and manufactures. It is a promising way to take use of big data to analyze and predict the phone changing event. In this paper, based on mobile user big data, first through statistical analysis, we find that three important probability distributions, i.e., power-law, log-normal, and geometric distribution, play an important role in the user behaviors. Second, the relationships between eight selected attributes and phone changing are built, for example, young people have greater intention to change their phones if they are using the phones belonging to the low occupancy phones or feature phones. Third, we verified the performance of four prediction models on phone changing event under three scenarios. Information gain ratio was used to implement attribute selection and then sampling method, cost-sensitive together with standard classifiers were used to solve imbalanced phone changing event. Experiment results show our proposed enhanced backpropagation neural network in the undersampling scenario can attain better prediction performance.

DOI

[9]
Cinnamon J, Jones S K, Adger W N.Evidence and future potential of mobile phone data for disease disaster management[J]. Geoforum, 2016,75:253-264.Global health threats such as the recent Ebola and Zika virus outbreaks require rapid and robust responses to prevent, reduce and recover from disease dispersion. As part of broader big data and digital humanitarianism discourses, there is an emerging interest in data produced through mobile phone communications for enhancing the data environment in such circumstances. This paper assembles user perspectives and critically examines existing evidence and future potential of mobile phone data derived from call detail records (CDRs) and two-way short message service (SMS) platforms, for managing and responding to humanitarian disasters caused by communicable disease outbreaks. We undertake a scoping review of relevant literature and in-depth interviews with key informants to ascertain the: (i) information that can be gathered from CDRs or SMS data; (ii) phase(s) in the disease disaster management cycle when mobile data may be useful; (iii) value added over conventional approaches to data collection and transfer; (iv) barriers and enablers to use of mobile data in disaster contexts; and (v) the social and ethical challenges. Based on this evidence we develop atypologyof mobile phone data sources, types, and end-uses, and adecision-treefor mobile data use, designed to enable effective use of mobile data for disease disaster management. We show that mobile data holds great potential for improving the quality, quantity and timing of selected information required for disaster management, but that testing and evaluation of the benefits, constraints and limitations of mobile data use in a wider range of mobile-user and disaster contexts is needed to fully understand its utility, validity, and limitations.

DOI

[10]
Church K, Oliver N.Understanding mobile web and mobile search use in today's dynamic mobile landscape[C]. Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile Hci 2011, Stockholm, Sweden, August 30 - September. DBLP, 2011:67-76.

[11]
Chang Y J, Newman M W.Making local information more accessible: A diary study of information channel selection of mobile users[C]// International Symposium of Chinese CHI. ACM, 2015:1-8.

[12]
刘艳彬,袁平.网站粘性与购买量关系的实证研究——基于消费者手机上网数据的研究[J].软科学,2010,24(1):131-134,144.通过由WAP(无线应用通讯协议,Wireless Application Protocol)网站所提供的手机用户上网点击流数据研究了网站粘性(Stickiness)与购买量之间的关系。研究结果表明网站粘性的三个方面(访问频率、访问持续期和访问深度)与购买量之间存在显著的正相关关系。

DOI

[ Liu Y B, Yuan P.An empirical analysis of the relationship between website stickiness and purchase quantity: Based on users' clickstream data in mobile service[J]. Soft Science, 2010,24(1):131-134,144. ]

[13]
Augner C, Hacker G W.Associations between problematic mobile phone use and psychological parameters in young adults[J]. International Journal of Public Health, 2012,57(2):437-441.This study aims to address possible associations between excessive or dysfunctional use of mobile phones and certain psychological variables. Our study focuses on Problematic Mobile Phone Use (PU) in

DOI PMID

[14]
Tsai M.How the diffusion of smart phones will change public opinion surveys in Taiwan: The feasibility of using blended samples of landline and cell-phone numbers for telephone surveys[C]// Portland International Conference on Management of Engineering and Technology. IEEE, 2015:2409-2416.

[15]
陆锋,刘康,陈洁.大数据时代的人群移动性研究[J].地球信息科学学报,2014,16(5):665-672.人类个体/群体移动特征是多学科共同关注的研究主题。移动定位、无线通讯和移动互联网技术的快速发展使得获取大规模、长时间序列、精细时空粒度的个体移动轨迹和相互作用定量化成为可能。同时,地理信息科学、统计物理学、复杂网络科学和计算机科学等多学科交叉也为人类移动性研究的定量化提供了有力支撑。本文首先系统总结了大数据时代开展人类移动性研究的多源异构数据基础和多学科研究方法,然后将人类移动性研究归纳为面向人和面向地理空间两大方向。面向人的研究侧重探索人类移动特性的统计规律,并建立模型解释相应的动力学机制,或分析人类活动模式,并预测出行或活动;面向地理空间的研究侧重从地理视角分析人类群体在地理空间中的移动,探索宏观活动和地理空间的交互特征。围绕这两大方向,本文评述了人类移动性的研究进展和存在问题,认为人类移动性研究在数据稀疏性、数据偏斜影响与处理、多源异构数据挖掘、机器学习方法等方面依然面临挑战,对多学科研究方法的交叉与融合提出了更高要求。

DOI

[ Lu F, Liu K, Chen J.Research on human mobility in Big Data Era[J]. Journal of Geo-information Science, 2014,16(5):665-672. ]

[16]
萧世伦,方志祥.从时空GIS视野来定量分析人类行为的思考[J].武汉大学学报·信息科学版,2014,39(6):667-670.从面向城市科学发展所必须的人类行为研究需求出发,从时空GIS的视野来看人类行为研究的基础问题与研究方法,剖析时空GIS对定量分析人类行为研究中的理想与现实间的鸿沟,以及所存在的理论与方法挑战,总结了时空GIS支撑人类行为研究的能力与不足,为大数据时代的人类行为研究前沿探索拓展思路。

DOI

[ Shaw S L, Fang Z X.Rethinking human behavior research from the perspective of space-time GIS[J]. Geomatics and Informationg Science of Wuhan University, 2014,39(6):667-670. ]

[17]
Shaw S L, Tsou M H, Ye X.Editorial: Human dynamics in the mobile and big data era[J]. International Journal of Geographical Information Science, 2016,30(9):1687-1693.react-text: 436 Geographic information systems for transportation earned its moniker of GIS-T because of active GIS-T research and applications carried out over the years. There were several GIS-T books and review papers published around the turn of the twenty-first century. As we are now 10 years into the new century, it is an appropriate time to assess the past accomplishments and look into the future of... /react-text react-text: 437 /react-text [Show full abstract]

DOI

[18]
常楠, 张三国.基于核方法和高斯混合模型的手机上网时长统计分析及应用[J].中国科学院大学学报,2015,32(1):136-139.

[ Chang N, Zhang S G.Mobile network access time statistical analysis and its application based on kernel method and Gaussian mixture model[J]. Journal of University of Chinese Academy of Sciences, 2015,32(1):136-139. ]

[19]
Yan H, Dou Y, Liu F, et al.Time division based on analyses of network user time span preference[C]. Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on. IEEE, 2009:177-181.

[20]
Falaki H, Mahajan R, Kandula S, et al.Diversity in smartphone usage[C]. International Conference on Mobile Systems, Applications, and Services. DBLP, 2010:179-194.

[21]
Cao H, Lin M.Mining smartphone data for app usage prediction and recommendations: A survey[J]. Pervasive & Mobile Computing, 2017,37:1-22.Smartphones nowadays have become indispensable personal gadgets to support our activities in almost every aspect of our lives. Thanks to the tremendous advancement of smartphone technologies, platforms, as well as the enthusiasm of individual developers, numerous mobile applications (apps) have been created to serve a wide range of usage purposes, making our daily life more convenient. While these apps are used, data logs are typically generated and ambience context is recorded forming a rich data source of the smartphone users- behaviors. In this paper, we survey existing studies on mining smartphone data for uncovering app usage patterns leveraging such a data source. Our discussions of the studies are organized according to two main research streams, namely app usage prediction and app recommendations alongside a few other related studies. Finally, we also present several challenges and opportunities in the emerging area of mining smartphone usage patterns.

DOI

[22]
Halvey M, Keane M T, Smyth B.Time based patterns in mobile-internet surfing[C]. Conference on Human Factors in Computing Systems, CHI 2006, Montréal, Québec, Canada, April, 2006:31-34.

[23]
Ward J H.Hierarchical Grouping to Optimize an Objective Function[J]. Journal of the American Statistical Association, 1963,58(301):236-244.A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (<latex>$n > 100$</latex>) studies when a precise optimal solution for a specified number of groups is not practical. Given n sets, this procedure permits their reduction to n - 1 mutually exclusive sets by considering the union of all possible n(n - 1)/2 pairs and selecting a union having a maximal value for the functional relation, or objective function, that reflects the criterion chosen by the investigator. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained. A general flowchart helpful in computer programming and a numerical example are included.

DOI

[24]
Gentleman W M, Sande G.Fast Fourier Transforms: For fun and profit[C]. Proceedings of the November 7-10, 1966, fall joint computer conference. ACM, 1966: 563-578.

[25]
Schuster, Arthur.On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena[J]. Terrestrial Magnetism, 2007,3(1):13-41.1. Obvious and hidden periodicities. A variable quantity may show periodic changes which become obvious as soon as a sufficient record has been obtained; such are the semi-diurnal changes of the tides, or the eleven years recurrence of sunspot maxima. We may call these obvious periodicities. Most often, however, small periodic variations are hidden behind irregular fluctuations, and their investigation then becomes a matter of considerable difficulty.

DOI

[26]
Fisher R A.Tests of significance in harmonic analysis[J]. Proceedings of the Royal Society of London A. Mathematical Physical & Engineering Sciences, 1929,125(796):54-59

DOI

文章导航

/