• 地球信息科学理论与方法 •

### 基于LDA和优化蚁群的OD流向时空语义聚类算法

1. 1.福州大学空间数据挖掘与信息共享教育部重点实验室,福州 350108
2.卫星空间信息技术综合应用国家地方联合工程研究中心,福州 350108
3.福州大学数字中国研究院（福建）,福州 350003
• 收稿日期:2021-09-06 修回日期:2021-11-10 出版日期:2022-05-25 发布日期:2022-07-25
• 通讯作者: * 邬群勇（1973— ）,男,山东诸城人,博士,研究员,主要从事时空数据挖掘和地理信息服务研究。 E-mail: qywu@fzu.edu.cn
• 作者简介:张 晗（1994— ）,男,福建永安人,硕士生,主要从事时空数据挖掘研究。E-mail: zh_curry@163.com
• 基金资助:
国家自然科学基金项目(41471333);中央引导地方科技发展专项(2021H0036)

### A Spatio-temporal Semantic Clustering Algorithm for OD Flow Direction based on LDA and Ant Colony Optimization

ZHANG Han1,2,3(), WU Qunyong1,2,3,*()

1. 1. Key Lab of Spatial Data Mining and Information Sharing of Ministry of Education, Fuzhou University, Fuzhou 350108, China
2. National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, Fuzhou 350108, China
3. The Academy of Digital China (Fujian), Fuzhou 350003, China
• Received:2021-09-06 Revised:2021-11-10 Online:2022-05-25 Published:2022-07-25
• Supported by:
National Natural Science Foundation of China(41471333);Central Guided Local Development of Science and Technology Project(2021H0036)

Abstract:

In order to solve the problem that semantic information is not fully considered in existing OD flow clustering algorithms and it is difficult to mine OD flow semantic information, this paper proposes an OD flow clustering algorithm based on the Latent Dirichlet Allocation (LDA) model and ant colony optimization algorithm. Firstly, the LDA Topic model is used to extract OD flows' semantics, and the JS divergence (Jensen-Shannon divergence) is used to quantify the semantic similarity between OD flows. We also propose a spatiotemporal semantic similarity calculation method that is constructed by integrating temporal, spatial, and semantic similarity, which provides data basis for flow clustering. Then, the graph network data structure is constructed according to the spatiotemporal semantic similarity, and the Gaussian function mapping and the connected component of the graph are used to simplify the data and eliminate the noise data. Based on the idea of CFDP algorithm (Clustering by fast search and find of density peaks algorithm), the intermediate centrality of nodes is used to optimize the selection strategy of the initial position of ant colony. Finally, the Multi-path Normalized Cut (MNCUT) graph criterion is used to strengthen the purpose of ant colony search, optimize the clustering effect of ant colony search, and realize the spatiotemporal semantic clustering for OD flow direction. Taking Xiamen taxi open data set and Xiamen map POI data as examples, the proposed method is verified. The experimental results show that: (1) The proposed method can effectively extract the semantic information of flow direction and measure the similarity degree between flow directions more comprehensively compared with the existing methods; (2) The Gaussian function mapping strategy and graph connected component feature are adopted to effectively eliminate the noise in the flow direction data, which saves the computational cost of undirected graph construction effectively by 88.5%~88.8% of the running time; (3) Compared with the existing algorithms, the clustering division of the proposed algorithm is more precise, and the correlation analysis of flow semantics can be carried out conveniently and effectively.