Journal of Geo-information Science >
Approaches for Human Mobility Data Generation: Research Progress and Trends
Received date: 2023-08-21
Revised date: 2023-10-17
Online published: 2024-05-11
Supported by
National Key Research and Development Program of China(2022YFB3904203)
National Natural Science Foundation of China(42271474)
National Natural Science Foundation of China(41901391)
Human mobility data play a crucial role in many real-world applications such as infectious diseases, transportation, and public safety. The development of modern Information and Communication Technologies (ICT) has made it easier to collect large-scale individual-level human mobility data, however, the availability and usability of the raw data are still significantly limited due to privacy concerns, as well as issues of data redundancy, missing, and noise. Generating synthetic human mobility data through modeling approaches to statistically approximate the real data is a promising solution. From the data perspective, the generated human mobility data can serve as a substitute for real data, mitigating concerns about personal privacy and data security, and enhance the low-quality real data. From the modeling perspective, the constructed models for human mobility data generation can be used for scenario simulations and mechanism exploration. The human mobility data generation tasks include individual trajectory data generation and collective mobility data generation, and the research methods primarily consist of mechanistic models and machine learning models. This article firstly provides a systematic review of the research progress in human mobility data generation and then summarizes its development trends and challenges. It can be observed that mechanistic-model-based methods are predominantly studied in the field of statistical physics, while machine-learning-based methods are primarily studied in the field of computer science. Although the two types of models have complementary advantages, they are still developing independently. The article suggests that future research in human mobility data generation should focus on: 1) exploring and revealing the underlying mechanisms of human mobility behavior from a multidisciplinary perspective; 2) designing hybrid approaches by coupling machine learning and mechanistic models; 3) leveraging cutting-edge generative Artificial Intelligence (AI) and Large Language Model (LLM) technologies; 4) improving the models' spatial generalization and transfer-learning capabilities; 5) controlling the costs of model training and implementation; and 6) designing reasonable evaluation metrics and balancing data utility with privacy-preserving effectiveness. The article asserts that human mobility processes are typical phenomenon of human-environment interactions. On the one hand, research in Geographic Information Science (GIS) field should integrate with theories and technologies from other disciplines such as computer science, statistical physics, complexity science, transportation, and others. While on the other hand, research in GIS field should harness the unique characteristics of GIS by explicitly incorporating geographic spatial effects, including spatial dependency, distance decay, spatial heterogeneity, scale, and more into the modeling process to enhance the rationality and performance of the human mobility data generation models.
LIU Kang . Approaches for Human Mobility Data Generation: Research Progress and Trends[J]. Journal of Geo-information Science, 2024 , 26(4) : 831 -847 . DOI: 10.12082/dqxxkx.2024.230488
表1 基于机理模型的个体轨迹生成方法Tab. 1 Individual trajectory generation methods based on mechanistic models |
类型 | 模型名称 | 年份 | 主要建模机制 | 实验数据 | 文献 | 代码链接 |
---|---|---|---|---|---|---|
EPR模型及其扩展 | EPR | 2010 | 幂律等待时间、幂律移动步长探索机制、 频率偏好返回机制 | CDR、GPS | Song等[25] | - |
r-EPR | 2015 | 幂律等待时间、幂律移动步长探索机制、 频率偏好返回机制、最近访问返回机制 | CDR、Brightkite | Barbosa等[26] | - | |
m-EPR | 2018 | 幂律移动步长探索机制、频率偏好返回 机制、固定大小的返回地点集 | Mobile Data Challenge (MDC) | Alessandretti等[27] | - | |
GeoSim | 2015 | 幂律等待时间、社交偏好探索机制、随机 地点探索机制、社交偏好返回机制、频率 偏好返回机制 | CDR | Toole等[30] | https://scikit-mobil ity.github.io/scikit-mobility/reference/models.html | |
d-EPR | 2015 | 幂律等待时间、基于距离与静态群体访 问频率的探索机制、频率偏好返回机制 | CDR、GPS | Pappalardo 等[29] | ||
STS-EPR | 2021 | 幂律等待时间、社交偏好探索机制、基于 距离与静态群体访问频率的探索机制、 社交偏好返回机制、频率偏好返回机制 | Foursquare | Cornacchia 等[31] | ||
DITRAS | 2018 | 活动时序模式、基于距离与静态群体访 问频率的探索机制、频率偏好返回机制 | CDR、GPS | Pappalardo 等[38] | ||
TimeGeo | 2016 | 昼夜节律及短程活动倾向、基于排序距离 衰减函数的探索机制、频率偏好返回机制 | CDR、GPS | Jiang等[37] | https://github.com/tsinghua-fib-lab/TrajSynVAE | |
w-EPR | 2019 | 幂律等待时间、固定每日外出时长、基于 排序距离衰减函数的探索机制、频率偏 好返回机制 | CDR、Taxi GPS、Travel survey | Wang等[39] | - | |
p-EPR | 2021 | 幂律等待时间、 偏转角度及幂律移动步长探索机制、频率 偏好返回机制 | CDR | Dong等[28] | https://github.com/leiii/VisitationLaw | |
CMM | 2021 | 幂律等待时间、基于距离与动态群体访 问频率的探索机制、频率偏好返回机制 | - | Xu等[32] | https://github.com/tsinghua-fib-lab/Collective-Mobilit-y-Model | |
其他重 要模型 | 连续时间随机游走模型(CTRW) | 2006 | 幂律等待时间、幂律移动步长 | Bank Note | Brockmann等[20] | - |
周期性随机游走模型 | 2011 | 固定个体居住与工作地、个体通勤时间 下限、个体恒定旅行速度恒定、个体每天 只有一次工作外活动 | Travel Survey | Yan等[36] | - | |
Container Model | 2020 | 位置之间的空间尺度层级距离、位置在 不同空间尺度层级中的吸引力 | GPS | Alessandretti等[33] | https://github.com/lalessan/scales_h uman_mobility/ |
表2 基于机器学习的个体轨迹生成方法Tab. 2 Individual trajectory generation methods based on machine learning |
类型 | 模型名称 | 年份 | 主要模块 | 实验数据 | 文献 | 代码链接 |
---|---|---|---|---|---|---|
利用GAN的 模型 | OuyangGAN | 2018 | GAN, CNN | MDC | Ouyang等[53] | - |
- | 2018 | SRNN, RNN-LSTM, RHN, PSMM, SGAN, RGAN | MDC | Kulkarni等[55] | - | |
LSTM-TrajGAN | 2020 | LSTM, GAN | Foursquare | Rao等[54] | https://github.com/GeoDS/LSTM-TrajGAN | |
MoveSim | 2020 | GAN, Attention, CNN | Mobile phone positioning data, GeoLife | Feng等[51] | https://github.com/FIBLAB/MoveSim | |
TSG | 2021 | GAN, LSTM, CNN | Taxi GPS | Wang等[57] | - | |
TS-TrajGen | 2023 | GAN, LSTM | Taxi GPS | Jiang等[58] | https://github.com/WenMellors/TS-TrajGen | |
DP-TrajGAN | 2022 | GAN, LSTM, POMDP | GeoLife, Taxi GPS | Zhang等[61] | - | |
TrajGen | 2021 | DCGAN, GRU | Taxi GPS | Cao等[59] | https://github.com/caochuntu/KDD2021_guizu | |
利用VAE的 模型 | SVAE | 2019 | VAE, LSTM | GPS | Huang等[63] | - |
TrajSynVAE | 2023 | VAE, LSTM | Mobile phone positioning data, Geolife, Foursquare | Wang等[65] | https://github.com/tsinghua-fib-lab/TrajSynVAE | |
VOLUNTEER | 2023 | VAE, Transformer, LSTM, MLP | Mobile phone positioning data | Long等[64] | - | |
利用Diffusion 的模型 | Diff-Traj | 2023 | Diffusion Model, U-Net | Taxi GPS | Zhu等[66] | - |
其他模型 | STAR | 2023b | GNN, GRU | Foursquare | Wang等[56] | - |
ActSTD | 2022 | NDE, GRU, LSTM | Foursquare, Mobile phone positioning data | Yuan等[60] | https://github.com/tsinghua-fib-lab/Activity-Trajectory-Generation | |
MTNet, TNet | 2022 | Encoder, Decoder, LSTM | Taxi GPS | Wang等[52] | https://github.com/wangyong01/MTNet_Code | |
AttnMove | 2021 | Attention | GeoLife, GPS, Tencent location data | Xia等[50] | https://github.com/XTxiatong/AttnMove |
表3 基于机理模型的群体移动量生成方法Tab. 3 Collective flow generation methods based on mechanistic models |
表4 基于机器学习的群体移动数据生成方法Tab. 4 Collective flow generation methods based on machine learning |
类型 | 模型名称 | 年份 | 主要模块 | 应用尺度 | 文献 | 代码链接 |
---|---|---|---|---|---|---|
预测两两区域 之间的群体移 动量 | ANN XGBoost | 2018 | - | 郡县/国家 | Robinson和Dilkina[77] | - |
RF ANN | 2019 | - | 城市内部人口统计区 | Pourebrahim等[70] | - | |
DeepGravity | 2021 | FFNN | 国家/州内部人口统计区 | Simini等[71] | https://github.com/scikit-mobility/DeepGravity | |
pop2flow | 2023 | Attention, GCN, MLP | 城市内部1km和3km网格 | Rong等[78] | - | |
GMEL | 2020 | GAT, GBRT | 城市内部人口统计区 | Liu等[79] | https://github.com/jackmiemie/GMEL | |
SI-GCN | 2021 | GCN | 城市内部1km网格 | Yao等[82] | https://github.com/s3pku/flow-imputation | |
ConvGCN‑RF | 2023 | CNN, GCN | 城市内部500m网格 | Yin等[80] | - | |
R2F-GCN | 2023 | GCN | 城市 | Wang等[81] | - | |
生成所有区域 之间的群体移 动量矩阵 | ODGN | 2023 | GNN, GAN, TCN | 城市内部人口统计区 | Rong等[83] | - |
DiffODGen | 2023 | Diffusion model | 城市内部人口统计区 | Rong等[84] | - | |
GODDAG | 2023 | GAT, GIN, MLP | 城市内部人口统计区 | Rong等[85] | - | |
MoGAN | 2022 | CNN | 城市内部网格 | Mauro等[86] | https://github.com/jonpappalord/GAN-flow |
表5 机理模型与机器学习两类方法对比Tab. 5 Comparation of mechanistic models and machine learning models for human mobility data generation |
机理模型(白箱) | 机器学习(黑箱) | |
---|---|---|
可解释性 | 强 | 弱 |
生成数据真实度 | 低 | 高 |
对训练数据要求 | 低 | 高 |
外推泛化能力 | 强 | 弱 |
建模参数 | 少 | 多 |
[1] |
|
[2] |
尹凌, 刘康, 张浩, 等. 耦合人群移动的COVID-19传染病模型研究进展[J]. 地球信息科学学报, 2021, 23(11):1894-1909.
[
|
[3] |
|
[4] |
|
[5] |
|
[6] |
柳林, 杜方叶, 宋广文, 等. 犯罪共生空间的类型识别及其特征分析[J]. 地理科学, 2018, 38(8):1199-1209.
[
|
[7] |
孙未未, 毛江云. 轨迹预测技术及其应用——从上海外滩踩踏事件说起[J]. 科技导报, 2016, 34(9):48-54.
[
|
[8] |
|
[9] |
|
[10] |
李德仁, 邵振峰, 于文博, 等. 基于时空位置大数据的公共疫情防控服务让城市更智慧[J]. 武汉大学学报·信息科学版, 2020, 45(4):475-487,556.
[
|
[11] |
陆锋, 刘康, 陈洁. 大数据时代的人类移动性研究[J]. 地球信息科学学报, 2014, 16(5):665-672.
[
|
[12] |
|
[13] |
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
[19] |
周涛, 韩筱璞, 闫小勇, 等. 人类行为时空特性的统计力学[J]. 电子科技大学学报, 2013, 42(4):481-540.
[
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
|
[26] |
|
[27] |
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
|
[34] |
|
[35] |
|
[36] |
|
[37] |
|
[38] |
|
[39] |
|
[40] |
|
[41] |
|
[42] |
|
[43] |
|
[44] |
|
[45] |
|
[46] |
|
[47] |
|
[48] |
|
[49] |
|
[50] |
|
[51] |
|
[52] |
|
[53] |
|
[54] |
|
[55] |
|
[56] |
|
[57] |
|
[58] |
|
[59] |
|
[60] |
|
[61] |
|
[62] |
|
[63] |
|
[64] |
|
[65] |
|
[66] |
|
[67] |
|
[68] |
|
[69] |
|
[70] |
|
[71] |
|
[72] |
刘瑜, 姚欣, 龚咏喜, 等. 大数据时代的空间交互分析方法和应用再论[J]. 地理学报, 2020, 75(7):1523-1538.
[
|
[73] |
刘二见, 闫小勇. 预测人类移动行为的介入机会类模型研究进展[J]. 物理学报, 2020, 69(24):60-67.
[
|
[74] |
|
[75] |
|
[76] |
|
[77] |
|
[78] |
|
[79] |
|
[80] |
|
[81] |
|
[82] |
|
[83] |
|
[84] |
|
[85] |
|
[86] |
|
[87] |
|
[88] |
董卫华, 廖华, 詹智成, 等. 2008年以来地图学眼动与视觉认知研究新进展[J]. 地理学报, 2019, 74(3):599-614.
[
|
[89] |
钟耳顺. 深度地图——论地图学与神经科学的结合[J]. 武汉大学学报(信息科学版), 2022, 47(12):1988-2002.
[
|
[90] |
|
[91] |
|
[92] |
|
[93] |
|
[94] |
|
[95] |
李大韦, 冯思齐, 曹奇, 等. 大数据背景下的路径选择行为建模[J]. 中国公路学报, 2021, 34(12):161-174.
[
|
[96] |
程昌秀, 沈石, 李强坤. 黄河流域人地系统研究的大数据支撑与方法探索[J]. 中国科学基金, 2021, 35(4):529-536.
[
|
[97] |
|
[98] |
张彤, 刘仁宇, 王培晓, 等. 感知物理先验的机器学习及其在地理空间智能中的研究前景[J]. 地球信息科学学报, 2023, 25(7):1297-1311.
[
|
[99] |
李峰, 王琦, 胡健雄, 等. 数据与知识联合驱动方法研究进展及其在电力系统中应用展望[J]. 中国电机工程学报, 2021, 41(13):4377-4390.
[
|
[100] |
肖立志. 机器学习数据驱动与机理模型融合及可解释性问题[J]. 石油物探, 2022, 61(2):205-212.
[
|
[101] |
|
[102] |
|
[103] |
|
[104] |
|
[105] |
|
[106] |
|
[107] |
|
[108] |
|
[109] |
|
[110] |
|
[111] |
|
[112] |
刘瑜, 汪珂丽, 邢潇月, 等. 地理分析中的空间效应[J]. 地理学报, 2023, 78(3):517-531.
[
|
[113] |
高松. 地理空间人工智能的近期研究总结与思考[J]. 武汉大学学报·信息科学版, 2020, 45(12):1865-1874.
[
|
/
〈 | 〉 |