Journal of Geo-information Science >
A Fuzzy C-Means Clustering Algorithm Based on Reachable Distance
Received date: 2024-01-23
Revised date: 2024-07-29
Online published: 2024-09-10
Supported by
National Natural Science Foundation of China Youth Project(62107013)
Hunan Provincial Education Department Scientific Research Project(22B0511)
Facility location is of great significance for improving residents’ quality of life, and geographic accessibility indicators, such as the road network, are often used as the main decision-making factors. Clustering analysis based on geographic accessibility is an important tool for solving such problems. However, existing clustering algorithms often fail to guarantee the accuracy of clustering results, the accessibility of cluster centers, or the selectivity of cluster centers, making them less effective in solving the facility location problem in real scenarios. This paper proposes a Fuzzy C-Means clustering algorithm based on Reachable Distance (FCM-RD), which modifies the objective function, the membership function, and the cluster center function of the classical FCM. It employs reachable distance as a measure of geographic reachable similarity and iterates the cluster centers during the clustering process. Specifically, to capture the true relationships and connectivity between different elements, FCM-RD takes into account physical and spatial barriers, employs the shortest path distance along the road network as the reachable distance, and aligns geographic coordinates with the road network. It is possible for one position on the road network to correspond to multiple positions in geographic coordinates. Consequently, when multiple candidate positions for cluster centers are obtained, a cluster center correction mechanism is designed to iterate the accessible cluster center with reachable distance during the clustering process. Mathematical analysis and experiments in actual scenarios both show the validity of the cluster center iteration mechanism, showing the selected cluster centers in each iteration of FCM-RD are the unique and minimum value points of the intra-cluster objective function. The rationality of FCM-RD is further verified through experiments, and it is compared with baseline algorithms from three aspects: experimental results, convergence, and performance. The results indicate that, compared to the baseline algorithms, FCM- RD improves performance on both the mean and maximum indicators of the shortest reachable distance, with some indicators even improving by up to 38.9%. In a few experiments, there are slight improvements in the DB index and silhouette coefficient indicators, and 100% of the cluster centers selected by FCM- RD are located on the road network. FCM- RD overcomes the shortcomings of ignoring geographical obstacles and unreachable cluster centers. In conclusion, FCM-RD not only obtains accessible cluster centers without location restrictions but also achieves better clustering results. FCM-RD provides an effective and precise solution for geographical spatial clustering in practical scenarios.
CUI Junchao , ZHANG Qiongbing , LI Xiaolong . A Fuzzy C-Means Clustering Algorithm Based on Reachable Distance[J]. Journal of Geo-information Science, 2024 , 26(9) : 2038 -2051 . DOI: 10.12082/dqxxkx.2024.240053
表1 符号说明Tab. 1 Symbol description |
| 符号 | 说明 |
|---|---|
| X = { x1, x2, ⋯, xj, xj+1, ⋯, xn },n∈N* | 聚类元素集合,n表示聚类元素个数 |
| C = {c1, c2, ⋯, ci, ci+1, ⋯, cd},d∈N*且d≤n | 簇中心集合,d表示簇中心个数 |
| F = {f1, f2, ⋯, fk, fk+1, ⋯, fv },v ∈ N* | 路网分叉点集合,v表示路网分叉点个数 |
| R(fk) = { r1, r2, ⋯, rq, rq+1, ⋯, rs },s∈N* | 分叉点搜索方向上的分支集合,s表示分支条数 |
| Y = {y | y ∈ X∪C∪F} | 聚类场景元素集合 |
| L(y) | 元素 y(包括聚类元素、分叉点、簇中心等)的路网坐标,以其与原点O之间的沿路网的最短距离和方向表示其路网坐标 |
| Point(L(y)) | 路网坐标为 L(y) 的点集 |
| FPoint(fk) | 位于分叉点 fk 搜索方向各分支上的聚类元素集合 |
| RPoint(fk, rq) = {xj | rq∈R(fk), xj∈FPoint(fk)} | 位于分叉点 fk 搜索方向 rq 分支上的聚类元素集合 |
| TopFork(y1, y2) | 元素 y1 到 y2 路径上的第一个分叉点 |
| 簇中心 ci 所在分支与聚类元素 xj 所在分支的公共分叉点 | |
| uij | 聚类元素 xj 隶属于簇中心 ci 的隶属度 |
| d(xj, ci) | 以聚类元素 xj 与簇中心点 ci 之间的路网坐标差值计算的有向可达距离 |
| P(y1, y2) | 元素y1与y2之间沿路网的最短路径 |
| cit | 第 t 代第 i 个簇中心 |
| cit' | 第 t 代第 i 个簇中心迭代修正过程中的簇中心 |
表2 簇中心位置结果Tab. 2 Cluster center location results |
| 实验 | 簇中心 总数/个 | 位于路网上的簇中心数/个 | 位于路网上的簇中心数占比/% | 路网周围簇中心数/个 | 路网周围簇中心数占比/% | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FCM-RD | 欧氏FCM | Mean Shift | FCM-RD | 欧氏FCM | Mean Shift | FCM-RD | 欧氏FCM | Mean Shift | FCM-RD | 欧氏FCM | Mean Shift | |||||
| G1 | 120 | 120 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | |||
| G2 | 120 | 120 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 30 | 0 | 0 | 25.0 | |||
| G3 | 120 | 120 | 0 | 0 | 100 | 0 | 0 | 0 | 30 | 30 | 0 | 25 | 25.0 | |||
| G4 | 60 | 60 | 0 | 0 | 100 | 0 | 0 | 0 | 24 | 0 | 0 | 40 | 0.0 | |||
| G5 | 90 | 90 | 0 | 0 | 100 | 0 | 0 | 0 | 11 | 0 | 0 | 12 | 0.0 | |||
| G6 | 150 | 150 | 0 | 0 | 100 | 0 | 0 | 0 | 12 | 60 | 0 | 8 | 40.0 | |||
| G7 | 180 | 180 | 0 | 0 | 100 | 0 | 0 | 0 | 31 | 30 | 0 | 17 | 16.7 | |||
| G8 | 120 | 120 | 0 | 0 | 100 | 0 | 0 | 0 | 30 | 60 | 0 | 25 | 50.0 | |||
注:位于路网周围是指在1:60 000 m的比例尺下,簇中心与路网接近重合的场景;加粗数据表示对比项中的较优结果。 |
表3 实验结果性能指标分析Tab. 3 Performance analysis of experimental results |
| 实验 | 算法 | 均值/m | 最大值/m | 最小值/m | DB指数 | 轮廓系数 |
|---|---|---|---|---|---|---|
| G1 | FCM-RD | 974.67 | 2 230.55 | 473.48 | 0.44 | 0.76 |
| 欧氏FCM | 1 107.67 | 3 363.04 | 116.71 | 0.47 | 0.76 | |
| Mean Shift | 1 159.41 | 3 317.53 | 355.77 | 0.70 | 0.82 | |
| 层次聚类 | 未生成簇中心 | 0.76 | ||||
| G2 | FCM-RD | 1 805.31 | 3 849.25 | 289.43 | 0.66 | 0.70 |
| 欧氏FCM | 2 123.84 | 6 295.07 | 45.94 | 0.87 | 0.66 | |
| Mean Shift | 2 128.23 | 5 563.67 | 57.74 | 1.11 | 0.73 | |
| 层次聚类 | 未生成簇中心 | 0.70 | ||||
| G3 | FCM-RD | 3 220.10 | 7 273.50 | 131.72 | 1.60 | 0.59 |
| 欧氏FCM | 3 327.96 | 8 955.90 | 152.05 | 0.70 | 0.58 | |
| Mean Shift | 3 490.11 | 9 794.47 | 50.97 | 1.42 | 0.59 | |
| 层次聚类 | 未生成簇中心 | 0.58 | ||||
| G4 | FCM-RD | 4 892.31 | 9 808.73 | 534.47 | 2.32 | 0.39 |
| 欧氏FCM | 5 070.98 | 11 102.91 | 93.41 | 1.24 | 0.37 | |
| Mean Shift | 7 605.33 | 15 304.62 | 64.80 | 3.29 | 0.23 | |
| 层次聚类 | 未生成簇中心 | 0.37 | ||||
| G5 | FCM-RD | 3 984.76 | 9 808.73 | 92.30 | 1.12 | 0.52 |
| 欧氏FCM | 4 112.86 | 10 169.18 | 175.67 | 0.86 | 0.49 | |
| Mean Shift | 4 174.03 | 9 821.42 | 83.40 | 0.93 | 0.53 | |
| 层次聚类 | 未生成簇中心 | 0.53 | ||||
| G6 | FCM-RD | 2 744.64 | 6 319.25 | 269.05 | 1.31 | 0.64 |
| 欧氏FCM | 2 883.95 | 9 798.69 | 35.25 | 0.83 | 0.62 | |
| Mean Shift | 2 904.36 | 6 987.29 | 355.14 | 1.49 | 0.64 | |
| 层次聚类 | 未生成簇中心 | 0.63 | ||||
| G7 | FCM-RD | 974.67 | 2 230.55 | 473.48 | 0.44 | 0.76 |
| 欧氏FCM | 1 107.67 | 3 363.04 | 116.71 | 0.47 | 0.76 | |
| Mean Shift | 1 159.41 | 3 317.53 | 355.77 | 0.70 | 0.82 | |
| 层次聚类 | 未生成簇中心 | 0.76 | ||||
| G8 | FCM-RD | 2 472.79 | 5 453.20 | 365.17 | 1.47 | 0.67 |
| 欧氏FCM | 2 539.43 | 7 161.58 | 24.80 | 0.93 | 0.67 | |
| Mean Shift | 2 536.71 | 6 353.28 | 95.46 | 1.55 | 0.68 | |
| 层次聚类 | 未生成簇中心 | 0.68 | ||||
注:加粗数据表示对比项中的较优结果。 |
| [1] |
王鹏洲, 赵志远, 姚伟, 等. 基于地理流空间的巡游车与网约车人群出行模式研究[J]. 地球信息科学学报, 2023, 25(4):726-740.
[
|
| [2] |
王浩成, 向隆刚, 关雪峰, 等. 基于出租车上下客数据流与分布式多阶段网格聚类的城市热点区域实时探测方法[J]. 地球信息科学学报, 2023, 25(7):1514-1530..
[
|
| [3] |
张亚, 刘纪平, 周亮, 等. 基于DBSCAN算法的北京市顺丰快递服务设施集群识别与空间特征分析[J]. 地球信息科学学报, 2020, 22(8):1630-1641.
[
|
| [4] |
|
| [5] |
Pooja,
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
/
| 〈 |
|
〉 |