A Multi-scale Visualization Method for the Trajectory Origin-Destination Data

JIN Cheng; CHEN Yuanyuan; YANG Min

doi:10.3724/SP.J.1047.2017.01011

Journal of Geo-information Science >

2017 , Vol. 19 >Issue 8: 1011 - 1018

DOI: https://doi.org/10.3724/SP.J.1047.2017.01011

Orginal Article

A Multi-scale Visualization Method for the Trajectory Origin-Destination Data

JIN Cheng ^,¹^,³ ,
CHEN Yuanyuan ² ,
YANG Min ^,⁴^,⁵^,^*

Expand

1. Information Engineering University, Zhengzhou 450001, China
2. Institute of Remote Sensing & Geographical Information System, Peking University, Beijing 100871, China
3.Xi′an Research Institute of Surveying and Mapping, Xi’an 710054, China
4. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Land and Resources, Shenzhen 518034, China
5. School of Resource and Environmental Sciences, Wuhan University, Wuhan 430072, China

*Corresponding author: YANG Min, E-mail: yangmin2003@whu.edu.cn

Received date: 2017-01-05

Request revised date: 2017-05-17

Online published: 2017-08-20

Copyright

《地球信息科学学报》编辑部所有

Fold

Abstract

Based on the taxi trajectory data from the city of Beijing, this study proposes a multi-scale visualization approach for trajectory OD (Origin-Destination) data. First, we extract OD points from initial trajectory raw data eliminating invalid points. Then, the distribution space of OD data is subdivided by density analysis and administrative unit aggregation. Finally, we define relevant parameters to summarize inherent OD flow pattern and customize their presentation of multi-scale visualization. In the process above, three regionalization results, which correspond to block level, business district level and district level, are obtained by setting different values of the minimal area of the aggregated region. Therefore, representations at three different scales can be outputted. The experimental results confirmed that our method could effectively achieve the reduction of trajectory big data and reveal mobility pattern, which is helpful for future decision making.

Key words： trajectory data; multi-scale visualization; flow pattern; clustering and regionalization

Cite this article

JIN Cheng , CHEN Yuanyuan , YANG Min . A Multi-scale Visualization Method for the Trajectory Origin-Destination Data[J]. Journal of Geo-information Science, 2017 , 19(8) : 1011 -1018 . DOI: 10.3724/SP.J.1047.2017.01011

1 引言

位置感知技术的普及,产生了描述人、物以及各种地理现象时空特征与演化关系的轨迹数据,如车/船/飞行器等交通工具的运行记录、便携式通讯终端定位数据、社交网络签到数据等^[1]。这些移动轨迹数据通常以众源地理信息（如OpenStreetMap,GeoLife^[2]）的形式集成在网络空间,隐含了各种自然及人类社会活动的模式规律,分析结果对智能化位置服务等领域有重要意义^[3]。但是时空轨迹数据本身具备的大数据量、非结构化、时空特征紧密结合等特点^[4],对传统空间数据分析方法提出了挑战,需要发展新的地理信息处理与可视化模型。

OD（Origin-Destination）数据是一种特殊的移动轨迹数据,仅标识移动物体在一定时间范围内运动的起始点和目标点而忽略中间轨迹细节。以出租车轨迹数据为例,起始O点代表乘客上车位置点,目标D点则是乘客下车位置点。一条OD特征点数据描述了特定车辆或乘客一段时间内的出行记录,而一定时间周期内所有车辆的轨迹OD数据则反映了区域车辆与人群流动关系,对于理解交通运行模式、城市规划布局有参考意义^[5-8]。此外,描述船舶、飞行器、野生动物、个人日常出行以及历史移民等主题的轨迹数据也蕴含类似OD信息。针对OD数据的分析与可视化表达,通常采用流向地图（Flow Map）这一特殊的可视化形式^[9],即利用直线或曲线符号连接O点与D点位置,进而利用视觉认知提取结构化信息。这种流向地图技术在人口迁徙^[10-12]、物流运输^[13]、公共交通^[14]等领域得到广泛应用。但是,常规流向地图技术并不适用于大数据量情形,特别是车辆轨迹OD数据通常包含数万甚至数十万条记录,必然引起表达上的拥挤冲突现象,导致结果认知上的不确定性^[15]。为此,相关学者开展了面向OD数据的降维研究,以降低流向地图表达内容的复杂度。采取的方法包括：① 通过空间聚类或剖分技术对OD点分布区域进行分区^[16-18],利用剖分区域替代单个OD点作为流向信息统计单元,从而减少待表达的流向关系;② 基于空间相似性原则对OD连接线实施分组^[19-20],将同组流向关系进行捆绑抽象表达。在已有研究成果的基础上,本研究以北京市城区出租车轨迹数据为例,探讨轨迹OD数据的多比例尺可视化方法。本研究的贡献主要在于：① 建立了一套完整的面向车辆轨迹OD数据多比例尺可视化表达的技术框架,包括数据的提取与清理、OD数据空间聚类与分布区域划分、区域OD流向特征定义与计算、以及专门的可视化符号设计;② 制定多种措施实现OD数据流向信息在不同比例尺下的清晰表达,包括与街区、商圈、城区等不同层次地理单元相对应的区域划分模型、符号化后面向图形冲突解决的流向特征选取,以及流向特征符号设计上的等级信息表达等。

2 轨迹OD数据定义

一条轨迹数据由一系列带有时间戳的坐标点p_i (x_i, y_i, t_i)构成,x_i和y_i表示移动物体在t_i时刻的位置坐标,高精度轨迹点还包括速度、方向等信息。对具备完整语义信息的轨迹线段提取起始与终止点,即可获得轨迹OD数据。以图1(a)所示出租车轨迹数据为例,轨迹片段{p₂, p₃, p₄, p₅, p₆}描述了乘客的一次出行记录,乘客上车点p₂定义为起始点（O点）,下车点p₆定义为终止点（D点）。假设T={T_i}(i∈{1, 2,…,n})表示具有n条OD记录的数据集,其中 T_i= {o, t_o, d, t_d},o和t_o表示起始点的位置坐标与时间,d和t_d表示终止点的位置坐标与时间。OD数据仅保留一段轨迹的起始点和终止点而忽略了过程信息,能够有效提高轨迹数据的存储分析效率,同时也保留了移动出行的时空流向特征。

3 面向轨迹OD数据的多比例尺表达方法

对于OD数据的可视化表达,最直观的方法是利用有向线段连接对应的O点和D点。但是在有限的地图表达空间面对大数据量时,会导致可视化符号相互压盖甚至杂乱无章（图1（b））,难以进行有效的认知与分析。因此,通常需要对OD数据进行综合抽象,提取背后隐含的主体流向特征进行可视化表达。本研究以北京市租车轨迹数据作为具体研究对象,选择采用聚类分区策略实现对OD数据的降维表达,并在分区模型以及可视化符号设计上建立适宜OD数据流向特征多比例尺表达的技术框架。

View original graphic|Download|PPT slide

Fig. 1 Samples of trajectory Origin-Destination data

图1 轨迹OD数据示例

3.1 OD数据提取与清理

出租车轨迹点记录信息如表1所示。由于数据采集过程受多种因素干扰,原始轨迹数据可能存在少量无效信息,包括：① GPSState 属性状态为0的轨迹点（GPS信号受大气、障碍物影响）;② 轨迹点关联速度值超过合理上限;③ 司机误操作导致载客状态变化过于频繁;④ 车辆移动距离与时间差不匹配。前2类无效数据可通过SQL属性查询实施预删除,后2类情况需在OD点提取配对过程中进行甄别清理。

Tab. 1 The original records of taxi trajectory data

表1 出租车轨迹数据原始记录

字段名称	数据说明	数据示例
V_ID	车辆标识	206400
Longitude	GPS经度/°	116.4243011
Latitude	GPS纬度/°	40.0727348
Time	GPS时间	20121101095636
Event	触发事件（0=变空车,1=变载客,3=其它）	1
SerState	运营状态（0=空车,1=载客,2=驻车,3=停运,4=其他）	1
Speed	GPS速度/（km/h）	43
GPSState	GPS状态（0=无效,1=有效）	1

OD数据提取过程如下：①将 “Event=1”且“SerState=1”的点标为潜在起始O点,将 “Event=0”且“SerState!=1”的点标为潜在终止D点;② 对提取的OD点按车辆标识V_ID排序,再按时间升序排列;同一个V_ID标识的OD点序列,若出现连续的O点,只保留最后O点;若出现连续的D点,只保留第一个D点;③ 对每一个V_ID, 从第一个O点开始,寻找时间上与之最近的D点,组织为OD匹配对,依次类推直至完成所有轨迹线的搜索;④对得到的OD匹配对,计算O点与D点时间间隔及沿轨迹线距离,剔除时间异常（如间隔≤0.05 h或≥2 h）和距离异常（如≥80 km）的部分。

3.2 OD数据空间聚类与分布区域划分

空间聚类的本质是空间剖分,即将目标分布区域R划分为若干个不相交的子区域R_i,即R=R₁∪R₂∪⋯∪R_n（∀i≠j,1≤i, j≤n, 且R_i∩R_j=∅）。通常包括3类方法：① 实施规则网格划分,操作简单易行,但容易割裂内在的分布特征;② 采用行政区划作为分区标准,能够顾及城市格局的历史成因,但行政因素过强常造成聚类偏见;③ 依据数据本身特征（如分布密度）实施聚类分区,理论上能够获得最佳效果,但操作相对繁琐,复杂情形需要人工干预。因此,本研究采用一种兼顾后2种策略的OD数据聚类分区方法。即以街区级别的行政区划{r₁,r₂,…,r_m}为基本区域单元,计算各自的OD点分布密度为{ρ₁,ρ₂,…,ρ_m}。然后,利用分布密度指标实施邻近区域单元间的聚类合并,并且通过最小区域面积参数A_min控制分区粒度,从而获得不同空间粒度下的分区结果（图2）,具体步骤包括：

（1）以基本区域单元为结点,拓扑邻近关系为连接边,构建区域单元对偶的图结构G,连接边权重定义为结点间标准化后的密度差值。

（2）由图结构G生成最小生成树结构T。将G中每个结点看作单独的类群,对连接边按权值升序排列。依次遍历连接边e_i,若e_i连接不同类群,则将e_i及连接结点加入T,同时合并两个类群为一个新类群。

（3）对树结构T实施迭代剖分,获得满足条件的聚类结构并实施合并,最终获得分区结果{R₁, R₂,…,R_n}(n≤m)。

View original graphic|Download|PPT slide

Fig. 2 Steps of regional units classification

图2 区域单元划分原理与步骤

步骤（3）中树结构剖分过程表现为连接边的移除,由2个方面条件决定：① 由目标函数（式（1））决定移除的最佳连接边。其中,H（R）表示当前树结构包含的各基本区域单元间的同质性,由密度指标计算得到（\(\overline{\rho}\)是当前树结构各基本区域单元的平均密度值）,H（R_a）和H（R_b）则表示剖分后两个子树结构各自的同质性。取Score值最大者对应的连接边作为当前树结构的最佳移除对象。② 由尺寸参数A_min控制剖分的粒度,即剖分得到的分区尺寸不小于A_min,以保证流向特征统计区域能够清晰表达。具体地,若当前剖分获得的任一子树结构对应的区域单元面积和小于A_min,判定当前剖分无效,并终止对当前分支方向进一步剖分。上述方法采用由下而上的层次化聚类策略,通过数量上的密度指标和几何上的最小区域面积控制可获得不同粒度的分区单元。但是在OD数据多比例尺表达应用中,采用何种比例尺范围以及各比例尺对应的分区单元粒度需要综合考虑用户需求,应用环境以及数据本身特点等因素。本文在实验分析部分围绕北京市城区出租车OD数据进行具体阐述。

\[\begin{equation}\left\{ \begin{array}{1} Score=H(R)-H(R_a)-H(R_b) \\ H(R)=\sum^k_{i=1}(\rho_i-\overline{\rho})^2 \end{array}\right. \ \ (1)\end{equation} \]

3.3 区域间OD流向特征的参量定义与计算

完成聚类分区后,按空间位置将每个OD点映射到对应区域,并计算各区域及相互间的OD流向特征。参考文献[7]并结合图3示例说明,定义如下统计参量：

（1）流入量(inflow)：某一时间段内流入区域R_i的移动车辆,即上车点O不在R_i,但是下车点D在区域R_i的OD匹配对数目。用数学公式可以表示为：

\[inflow(R_i)=|\{T_k|T_k,O\notin R_i ,T_k,d\in R_i ,1\leq k\leq n\}| \ \ (2)\]

（2）流出量(outflow)：某一时间段内流出区域R_i的移动车辆,即上车点O在R_i,但是下车点D不在区域R_i的OD匹配对数目。

\[outflow(R_i)=|\{T_k|T_k,O\notin R_i ,T_k,d\notin R_i ,1\leq k\leq n\}| \ \ (3)\]

（3）净流入比率(Net Flow Ratio)：即某一时间段内区域R_i的净流量与流入、流出量之和的比值。该值为正,表明区域属于车流或人流的输入状态,反之则为输出状态。

\[NetFlowRatio(R_i)=\frac{inflow(R_i)-outflow(R_i)}{inflow(R_i)-outflow(R_i)} \ \ (4)\]

除上述单个区域的流向特征外,还可进一步描述不同区域间的移动车辆流向关系,以分析跨区域移动模式。如某一时间段内定义由区域R_i到区域R_j的流入量Inflow(R_i, R_j)为：

\[inflow(R_i,R_j)=|\{k|T_k,O\in R_i ,T_k,d\in R_i ,1\leq k\leq n\}| \ \ (5)\]

View original graphic|Download|PPT slide

Fig. 3 Statistics of OD flow characteristics

图3 OD流向特征统计参量示意图

3.4 符号设计与图形冲突处理

完成OD数据区域流向特征计算后,需要在多比例尺语境下设计合理的可视化符号进行准确表达。同时,该过程还需要进一步从可视化表达层面实施相关制图综合措施,以保证地图表达的清晰性。一方面,OD流向特征可视化符号设计上要体现出不同比例尺表达间的等级传承关系。随着比例尺变大区域划分逐渐细致,区域间的流向特征强度减弱,相应的符号尺寸越小并且结构越简单。另一方面,实施聚类分区后表达的流向关系虽然大大减少,但是可视化后仍然可能造成图面符号间的拥挤甚至压盖。此时,在局部区域探测得到图形符号冲突现象后,需要依据流向关系的强弱进行筛选,舍弃部分数量级别较小的流向信息。整个过程可借鉴地图综合领域关于道路、河流等线性对象的选取思想,综合考虑个体“资格”和整体分布特征进行筛选。

4 实验分析

本研究在ArcGIS平台下开发了上述功能,包括轨迹数据OD特征点提取、空间聚类与区域划分、以及相关的符号可视化模块。同时,采用真实轨迹数据对提出的方法进行了有效性验证。实验数据为北京市的出租车GPS轨迹数据,数据范围涵盖北京市的五环线内城区区域,时间跨度2012年11月1日（周四）24小时范围,同时将北京市街道级别的行政区划数据作为OD数据分区的辅助数据。出租车原始的轨迹数据包括车辆标识、触发事件、运营状态、采样时刻,以及采样时车辆的位置、速度、方位和GPS状态。依据3.1节方法提取OD数据并进行清理,然后截取早（7:00-9:00）、中（11:00-13:00）、晚（17:00-19:00）、午夜（22:00-24:00）4个代表时段共173 198条有效OD数据记录,作为实验的基础数据。图4(a)和(b)分别是原始的轨迹数据和清理后获得的OD数据。

View original graphic|Download|PPT slide

Fig. 4 The original trajectory data and the extracted Origin-Destination points

图4 原始出租车轨迹数据和提取的OD数据

针对北京市城区出租车OD数据这一具体对象,作者认为用户对OD流向特征的认知通常建立在3个不同层次（或分辨率）的地理单元之上,包括城区级别（如海淀区）、商圈级别（如中关村商圈）和街区级别。在此背景下,通过分析已有地图产品在不同比例尺表达下地理实体的分辨率信息,可以推断与上述3种地理单元表达相适应的比例尺范围。本研究以百度地图比例尺分级为参考依据,选择1:250 000（第12级）、1:125 000（第13级）、1:62 500（第14级）作为与城区级别、商圈级别和街区级别3种不同地理单元相对应的表达比例尺。进一步地,通过上文提出的聚类分区模型与人工监督相结合的方式确定不同比例尺下的分区结果,即通过设置不同的最小区域面积参数获得一系列不同粒度的分区候选结果,然后由制图人员综合考虑地理单元层次、图面可辨析区域尺寸等因素判断每一比例尺下的最佳分区结果。具体实验中,通过调整最小区域面积参数获得近50组不同空间粒度的OD数据聚类分区结果。然后,通过人工判断识别的方式选取其中的三组分区结果作为与上述3种比例尺相适应的分区情形。如图5所示,城区级别（1:250 000）包含10个分区,商圈级别（1:125 000）包含50个分区,街道级别（1:62 500）包含108个分区。在此基础上,分别统计各区域间的OD流向特征,最终获得3种不同比例尺表达空间下区域间的OD流向关系。

View original graphic|Download|PPT slide

Fig. 5 Results of regional units and flow patterns at different scales

图5 不同比例尺下的分区结果及提取的流向关系

View original graphic|Download|PPT slide

Fig. 6 Visualization of OD flow across different sub-regions

图6 OD数据区域间流向关系的符号可视化结果

OD数据流向特征的符号设计上,利用冷暖色系不同颜色渲染各分区客流流入比率信息,蓝色系表示净流量为负,客流输出大于输入,红色系表示净流量为正,客流输入大于输出。同时,采用带箭头的带状符号描述不同区域间的流向关系。带状符号两端分别定位于不同的两个区域内,箭头方向表示流向,带状符号的宽度及颜色对应于流量大小。图6对应图5(b)所示的商圈级别的OD流向特征符号可视化化结果,可直观地发现隐含在轨迹数据背后的车辆或客流移动模式。例如,早间时段中关村商圈、金融街等地成为显著的客流输入区域,而靠近五环的城郊区域净流入率多为负值;午间时段客流交互集中于三环内的各区域间,特别是办公、用餐等商圈区域呈显著人流聚集态势;晚间时段城市中心开始出现“空洞”,市中心不断向外辐射人流,但部分以商娱为主的商圈区域仍具有一定的客流吸引力;午夜阶段出租车总流量显著减少,客流更多表现为向城郊区域流动。

View original graphic|Download|PPT slide

Fig. 7 Visualization of the flow characteristics at different scales

图7 OD数据流向信息在不同比例尺下的表达效果

图7是采用本文方法获得的不同比例尺级别下OD数据流向特征的可视化表达效果。由图可明显地看到,随着显示比例尺的缩放,即从1:250 000（对应城区级别）到1:125 000（对应商圈级别）再到1:62 500（对应街区级别）,可以获得不同详细程度的OD流向信息与分布状况,从而满足不同尺度空间下对OD流向特征的认知。

5 结论

本文以北京市出租车轨迹数据为例,探讨了轨迹起止特征点数据的多比例尺可视化方法。论文基于ArcGIS软件平台构建了包括数据提取与清理、空间聚类与区域划分、流向特征定义与计算、以及可视化符号设计等步骤的完整的技术框架。其中,利用层次化的空间聚类模型与人工监督相结合方法,获得了面向街区、商圈、城区等不同层次地理单元的OD数据多比例尺表达结果。试验结果验证了上述方法的可行性,通过分析也表明其对OD轨迹数据分析挖掘的积极意义。

随着各种传感器的普及,地理信息科学分析的对象逐渐转移到以轨迹数据为代表的时空大数据。当前大数据分析的主流方法侧重“数”的思维,即将空间实体描述相关的位置及属性特征抽象为多维特征,然后应用统计推断、数据挖掘、机器学习等理论实施挖掘,最后将分析结果以地图可视化的形式进行表达。上述过程中,知识的分析挖掘由数理统计方法实施,地图仅作为分析结果的表达载体。另外一种思路是利用地图可视化在“形”上的分析优势,借助地图综合、地图投影以及符号可视化技术对“大数据”进行降维表达^[21],引导人的空间认知能力对复杂地理现象进行深度挖掘。本研究属后一种思路,即通过对OD数据的可视化表达展示不同尺度空间下车辆交互关系,从而协助对车辆出行模式的认知。下一步研究将从“深度”和“广度”2个方面扩展。“深度”即优化完善方法模型,特别是将时间分辨率和语义分辨率纳入到模型中,建立完备的轨迹数据多尺度可视化体系;“广度”即将上述方法推广到其他数据源,如公共交通刷卡数据、社交媒体签到数据等。

The authors have declared that no competing interests exist.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]

陆锋,刘康,陈洁.大数据时代的人类移动性研究[J].地球信息科学学报,2014,16(5):665-672.

人类个体/群体移动特征是多学科共同关注的研究主题。移动定位、无线通讯和移动互联网技术的快速发展使得获取大规模、长时间序列、精细时空粒度的个体移动轨迹和相互作用定量化成为可能。同时，地理信息科学、统计物理学、复杂网络科学和计算机科学等多学科交叉也为人类移动性研究的定量化提供了有力支撑。本文首先系统总结了大数据时代开展人类移动性研究的多源异构数据基础和多学科研究方法，然后将人类移动性研究归纳为面向人和面向地理空间两大方向。面向人的研究侧重探索人类移动特性的统计规律，并建立模型解释相应的动力学机制，或分析人类活动模式，并预测出行或活动；面向地理空间的研究侧重从地理视角分析人类群体在地理空间中的移动，探索宏观活动和地理空间的交互特征。围绕这两大方向，本文评述了人类移动性的研究进展和存在问题，认为人类移动性研究在数据稀疏性、数据偏斜影响与处理、多源异构数据挖掘、机器学习方法等方面依然面临挑战，对多学科研究方法的交叉与融合提出了更高要求。

DOI

[ Lu

, Liu

, Chen

Research on human mobility in big data era[J]. Journal of Geo-information Science, 2014,16(5):665-672. ]

[2]

Zheng

, Xie

, Ma W

GeoLife: A collaborative social networking service among user, location and trajectory[J]. IEEE Data Engineering Bulletin, 2010,33(2):32-39.

People travel in the real world and leave their location history in a form of trajectories. These trajectories do not only connect locations in the physical world but also bridge the gap between people and locations. This paper introduces a social networking service, called GeoLife, which aims to understand trajectories, locations and users, and mine the correlation between users and locations in terms of user-generated GPS trajectories. GeoLife offers three key applications scenarios: 1) sharing life experiences based on GPS trajectories; 2) generic travel recommendations, e.g., the top interesting locations, travel sequences among locations and travel experts in a given region; and 3) personalized friend and location recommendation.

[3]

龙瀛,张宇,崔承印.利用公交刷卡数据分析北京职住关系和通勤出行[J].地理学报, 2012,67(10):1339-1352.

基于位置服务（Location Based Service,LBS）技术为研究城市系统的时空动态规律提供了新的视角,已往多基于移动通讯（GSM）、全球定位系统（GPS）、社会化网络（SNS）和无线宽带热点（Wi-Fi）数据开展研究,但少有研究利用公交IC卡刷卡数据进行城市系统分析。普遍存在的LBS数据虽然具有丰富的时间和空间信息,但缺乏社会维度信息,使其应用范围受到一定限制。本文基于2008年北京市连续一周的公交IC卡（Smart Card Data,SCD）刷卡数据,结合2005年居民出行调查、地块级别的土地利用图,识别公交持卡人的居住地、就业地和通勤出行,并将识别结果在公交站点和交通分析小区（TAZ）尺度上汇总：①将识别的通勤出行分别从通勤时间和距离角度,与居民出行调查数据和其他已有北京相关研究进行对比,显示较好的吻合性;②对来自3大典型居住区和去往6大典型办公区的通勤出行进行可视化并对比分析;③对全市基于公交的通勤出行进行可视化,并识别主要交通流方向。本研究初步提出了从传统的居民出行调查和城市GIS数据建立规则,用于SCD数据挖掘的方法,具有较好的可靠性。

DOI

[ Long Y, Zhang Y, Cui C Y. Identifying commuting pattern of Beijing using bus smart card data[J]. Acta Geographica Sinica, 2012,67(10):1339-1352. ]

[4]	Andrienko G, Andrienko N, Fuchs G.Understanding movement data quality[J]. Journal of location Based services, 2016,10(1):31-46. DOI

[5]	Yue Y, Zhuang Y, Li Q Q, et al.Mining time-dependent attractive areas and movement patterns from taxi trajectory data[C]. Proceedings of the 17^th International Conference on Geoinformatics. 2009:1-6.

[6]	Zhang W S, Li S J, Pan G.Mining the semantics of origin-destination flows using taxi traces[C]. Proceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM, 2012:943-949.

[7]

Guo D

, Zhu

, Jin

, et al.Discovering spatial patterns in origin-destination mobility data[J]. Transactions in GIS, 2012,16(3):411-429.

Abstract Mobility and spatial interaction data have become increasingly available due to the wide adoption of location-aware technologies. Examples of mobility data include human daily activities, vehicle trajectories, and animal movements, among others. In this article we focus on a special type of mobility data, i.e. origin-destination pairs, and present a new approach to the discovery and understanding of spatio-temporal patterns in the movements. Specifically, to extract information from complex connections among a large number of point locations, the approach involves two steps: (1) spatial clustering of massive GPS points to recognize potentially meaningful places; and (2) extraction and mapping of the flow measures of clusters to understand the spatial distribution and temporal trends of movements. We present a case study with a large dataset of taxi trajectories in Shenzhen, China to demonstrate and evaluate the methodology. The contribution of the research is two-fold. First, it presents a new methodology for detecting location patterns and spatial structures embedded in origin-destination movements. Second, the approach is scalable to large data sets and can summarize massive data to facilitate pattern extraction and understanding.

DOI

[8]

Pan

, Qi G

, Wu Z

, et al.Land-use classification using taxi GPS traces[J]. IEEE Transactions on Intelligent Transportation Systems, 2013,14(1):113-123.

Detailed land use, which is difficult to obtain, is an integral part of urban planning. Currently, GPS traces of vehicles are becoming readily available. It conveys human mobility and activity information, which can be closely related to the land use of a region. This paper discusses the potential use of taxi traces for urban land-use classification, particularly for recognizing the social function of urban land by using one year's trace data from 4000 taxis. First, we found that pick-up/set-down dynamics, extracted from taxi traces, exhibited clear patterns corresponding to the land-use classes of these regions. Second, with six features designed to characterize the pick-up/set-down pattern, land-use classes of regions could be recognized. Classification results using the best combination of features achieved a recognition accuracy of 95%. Third, the classification results also highlighted regions that changed land-use class from one to another, and such land-use class transition dynamics of regions revealed unusual real-world social events. Moreover, the pick-up/set-down dynamics could further reflect to what extent each region is used as a certain class.

DOI

[9]	Phan D, Xiao L, Yeh R, et al.Flow Map Layout[C]. IEEE Symposium on information visualization. 2005:219-224.

[10]	Tobler W R.Spatial Interaction Patterns[J]. Journal of Environmental Systems, 1976,6(4):271-301. DOI

[11]

Tobler W

Experiments in Migration Mapping by Computer[J]. American Cartographer, 1987,14:155-163.

ABSTRACT Migration maps represent patterns of geographical movement by arrows or bands between places, using information arriving in "from-to" tables. In the most interesting cases the tables are of large size, suggesting that computer assistance would be useful in the preparation of the maps. A computer program prepared for this purpose shows that graphical representation is feasible for tables as large as fifty by fifty, and possibly larger. The program contains options for alternate forms of movement depiction, and rules are suggested for the parsing of migration tables prior to the cartographic display, without loss of spatial resolution.

DOI

[12]

Guo

, Zhu

Origin-destination flow data smoothing and mapping[J]. IEEE transactions on visualization and computer graphics, 2014,20(12):2043-2052.

This paper presents a new approach to flow mapping that extracts inherent patterns from massive geographic mobility data and constructs effective visual representations of the data for the understanding of complex flow trends. This approach involves a new method for origin-destination flow density estimation and a new method for flow map generalization, which together can remove spurious data variance, normalize flows with control population, and detect high-level patterns that are not discernable with existing approaches. The approach achieves three main objectives in addressing the challenges for analyzing and mapping massive flow data. First, it removes the effect of size differences among spatial units via kernel-based density estimation, which produces a measurement of flow volume between each pair of origin and destination. Second, it extracts major flow patterns in massive flow data through a new flow sampling method, which filters out duplicate information in the smoothed flows. Third, it enables effective flow mapping and allows intuitive perception of flow patterns among origins and destinations without bundling or altering flow paths. The approach can work with both point-based flow data (such as taxi trips with GPS locations) and area-based flow data (such as county-to-county migration). Moreover, the approach can be used to detect and compare flow patterns at different scales or in relatively sparse flow datasets, such as migration for each age group. We evaluate and demonstrate the new approach with case studies of U.S. migration data and experiments with synthetic data.

DOI PMID

[13]	Ullman E L, Dacey M F.The minimum requirements approach to the urban economic base[J]. Papers in Regional Science, 1960,6(1):175-94.First page of article DOI

[14]

Corcoran

, Chhetri

, Stimson

Using circular statistics to explore the geography of the journey to work[J]. Papers in Regional Science, 2009,88(1):119-32

Abstract Abstract This article introduces the application of circular statistics to an analysis of ‘journey to work’ (JTW) data for the South East Queensland (SEQ) region in Australia. The JTW data include the total number of journeys between an origin (home) and destination zone (work) across SEQ. Using bespoke tools developed in a GIS environment the direction and frequency of each journey is first calculated. Using the outputs from this process, two descriptive measures – namely the circular mean and circular variance – are then computed and the results presented. Analysis of the mapped outputs shows strong JTW patterns that are readily detectable and visualised using a combination of GIS and circular statistics. Resumen Este artículo presenta la aplicación de estadísticos circulares a un análisis de datos de ‘desplazamiento al trabajo' (DAT) para la región de Queensland Sur Oriental en Australia. Los datos de DAT incluyen el número total de desplazamientos entre un origen (domicilio) y la zona de destino (trabajo) de todo QSO. La dirección y frecuencia de cada desplazamiento se calcula en primer lugar, utilizando herramientas a medida desarrolladas en un ambiente SIG. Utilizando los resultados de este proceso, se calculan a continuación dos medidas descriptivas – la media circular y la varianza circular – y se presentan los resultados. El análisis de los resultados plasmados en un mapa muestra fuertes patrones de DAT que son fácilmente detectables y visibles utilizando la combinación de SIG y estadísticos circulares.

DOI

[15]

Adrienko

, Adrienko

Spatial generalization and aggregation of massive movement data[J]. IEEE Transactions on visualization and computer graphics, 2011,17(2): 205-219.

Movement data (trajectories of moving agents) are hard to visualize: numerous intersections and overlapping between trajectories make the display heavily cluttered and illegible. It is necessary to use appropriate data abstraction methods. We suggest a method for spatial generalization and aggregation of movement data, which transforms trajectories into aggregate flows between areas. It is assumed that no predefined areas are given. We have devised a special method for partitioning the underlying territory into appropriate areas. The method is based on extracting significant points from the trajectories. The resulting abstraction conveys essential characteristics of the movement. The degree of abstraction can be controlled through the parameters of the method. We introduce local and global numeric measures of the quality of the generalization, and suggest an approach to improve the quality in selected parts of the territory where this is deemed necessary. The suggested method can be used in interactive visual exploration of movement data and for creating legible flow maps for presentation purposes.

DOI PMID

[16]

Guo

Flow mapping and multivariate visualization of large spatial interaction data[J]. IEEE Transactionson Visualization and Computer Graphics, 2009,15:1041-48.

Spatial interactions (or flows), such as population migration and disease spread, naturally form a weighted location-to-location network (graph). Such geographically embedded networks (graphs) are usually very large. For example, the county-to-county migration data in the U.S. has thousands of counties and about a million migration paths. Moreover, many variables are associated with each flow, such as the number of migrants for different age groups, income levels, and occupations. It is a challenging task to visualize such data and discover network structures, multivariate relations, and their geographic patterns simultaneously. This paper addresses these challenges by developing an integrated interactive visualization framework that consists three coupled components: (1) a spatially constrained graph partitioning method that can construct a hierarchy of geographical regions (communities), where there are more flows or connections within regions than across regions; (2) a multivariate clustering and visualization method to detect and present multivariate patterns in the aggregated region-to-region flows; and (3) a highly interactive flow mapping component to map both flow and multivariate patterns in the geographic space, at different hierarchical levels. The proposed approach can process relatively large data sets and effectively discover and visualize major flow structures and multivariate relations at the same time. User interactions are supported to facilitate the understanding of both an overview and detailed patterns.

DOI PMID

[17]

Zhu

, Guo

Mapping large spatial flow data with hierarchical clustering[J]. Transactions in GIS, 2014,18(3):421-435.

AbstractIt is challenging to map large spatial flow data due to the problem of occlusion and cluttered display, where hundreds of thousands of flows overlap and intersect each other. Existing flow mapping approaches often aggregate flows using predetermined high-level geographic units (e.g. states) or bundling partial flow lines that are close in space, both of which cause a significant loss or distortion of information and may miss major patterns. In this research, we developed a flow clustering method that extracts clusters of similar flows to avoid the cluttering problem, reveal abstracted flow patterns, and meanwhile preserves data resolution as much as possible. Specifically, our method extends the traditional hierarchical clustering method to aggregate and map large flow data. The new method considers both origins and destinations in determining the similarity of two flows, which ensures that a flow cluster represents flows from similar origins to similar destinations and thus minimizes information loss during aggregation. With the spatial index and search algorithm, the new method is scalable to large flow data sets. As a hierarchical method, it generalizes flows to different hierarchical levels and has the potential to support multi-resolution flow mapping. Different distance definitions can be incorporated to adapt to uneven spatial distribution of flows and detect flow clusters of different densities. To assess the quality and fidelity of flow clusters and flow maps, we carry out a case study to analyze a data set of 243,850 taxi trips within an urban area.

DOI

[18]

王亮,胡琨元,库涛,等.基于多尺度空间划分与路网建模的城市移动轨迹模式挖掘[J].自动化学报,2015,41(1):47-58.

针对城市移动轨迹模式挖掘问题展开研究, 提出移动全局模式与移动过程模式相结合的挖掘方法, 即通过移动轨迹的起始位置点--终点位置点 (Origin-destination, OD点) 与移动过程序列分别进行移动全局模式与过程模式的发现. 在移动全局模式发现中, 提出了弹性多尺度空间划分方法, 避免了硬性等尺度网格划分对密集区域边缘的破坏, 同时增强了密集区域与稀疏区域的区分能力.在移动过程模式发现中, 提出了基于移动轨迹的路网拓扑关系模型构建方法, 通过路网关键位置点的探测抽取拓扑关系模型.最后基于空间划分集合与路网拓扑模型对原始移动轨迹数据进行序列数据转换与频繁模式挖掘. 通过深圳市出租车历史 GPS 轨迹数据的实验结果表明, 该方法与现有方法相比在区域划分、数据转换等方面具有更好的性能, 同时挖掘结果语义更为丰富, 可解释性更强.

DOI

[ Wang

, Hu K

, Ku

, et al.Mining urban moving trajectory patterns based on multi-scale space partition and road network modeling[J]. Acta Automatica Sinica, 2015,41(1):47-58. ]

[19]

Cui

, Zhou

, Qu

, et al.Geometry-based edge clustering for graph visualization[J]. IEEE Transactions on Visualization and Computer Graphics, 2008,14:1277-84.

Graphs have been widely used to model relationships among data. For large graphs, excessive edge crossings make the display visually cluttered and thus dif03cult to explore. In this paper, we propose a novel geometry-based edge-clustering framework that can group edges into bundles to reduce the overall edge crossings. Our method uses a control mesh to guide the edge-clustering process; edge bundles can be formed by forcing all edges to pass through some control points on the mesh. The control mesh can be generated at different levels of detail either manually or automatically based on underlying graph patterns. Users can further interact with the edge-clustering results through several advanced visualization techniques such as color and opacity enhancement. Compared with other edge-clustering methods, our approach is intuitive, 04exible, and ef03cient. The experiments on some large graphs demonstrate the effectiveness of our method.

DOI PMID

[20]

Holten

, van Wijk J

. Force-directed edge bundling for graph visualization[J]. Computer Graphics Forum, 2009,28(3):983-90.

Graphs depicted as node-link diagrams are widely used to show relationships between entities. However, node-link diagrams comprised of a large number of nodes and edges often suffer from visual clutter. The use of edge bundling remedies this and reveals high-level edge patterns. Previous methods require the graph to contain a hierarchy for this, or they construct a control mesh to guide the edge bundling process, which often results in bundles that show considerable variation in curvature along the overall bundle direction. We present a new edge bundling method that uses a self-organizing approach to bundling in which edges are modeled as flexible springs that can attract each other. In contrast to previous methods, no hierarchy is used and no control mesh. The resulting bundled graphs show significant clutter reduction and clearly visible high-level edge patterns. Curvature variation is furthermore minimized, resulting in smooth bundles that are easy to follow. Finally, we present a rendering technique that can be used to emphasize the bundling.

DOI

[21]

艾廷华. 大数据驱动下的地图学发展[J].测绘地理信息,2016,41(2):1-7.

地图作为地学研究的可视化工具和研究成果的传播载体,在空间大数据研究中发挥重要作用,通过尺度变换将空间大数据变＂小＂,通过视觉语言可视化将大数据特征直观展示,通过投影变换将大数据降维。在完成该工作中,空间大数据的新的技术特征将推动地图综合、地图可视化与地图投影的新发展。同时,在大数据驱动下,地图应用可拓宽到非空间数据的表达中,对泛在网络空间的网络行为、集合空间的语义信息可视化表达,产生赛博地图、隐喻地图等新的地图形式。

DOI

[ AI

Development of cartography driven by big data[J]. Journal of Geomatics, 2016,41(2):1-7. ]

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

1 引言

2 轨迹OD数据定义

3 面向轨迹OD数据的多比例尺表达方法

Fig. 1 Samples of trajectory Origin-Destination data

3.1 OD数据提取与清理

Tab. 1 The original records of taxi trajectory data

3.2 OD数据空间聚类与分布区域划分

Fig. 2 Steps of regional units classification

3.3 区域间OD流向特征的参量定义与计算

Fig. 3 Statistics of OD flow characteristics

3.4 符号设计与图形冲突处理

4 实验分析

Fig. 4 The original trajectory data and the extracted Origin-Destination points

Fig. 5 Results of regional units and flow patterns at different scales

Fig. 6 Visualization of OD flow across different sub-regions

Fig. 7 Visualization of the flow characteristics at different scales

5 结论

References