一种识别共享单车潮汐点的时空模型和基于KNN-LightGBM的租还需求预测方法
柯日宏(1998—),男,福建三明人,硕士生,主要从事地理信息服务与时空数据挖掘研究。E-mail: 820916024@qq.com |
收稿日期: 2022-09-08
修回日期: 2022-12-06
网络出版日期: 2023-04-19
基金资助
中国科学院战略性先导科技专项(A类)(XDA23100502)
福建省高校数字经济学科联盟建设(闽教高〔2022〕15号)
A Spatial-temporal Model for Identifying Tidal Shared-bicycle Stops and Bicycle Sharing Demand Prediction based on KNN-LightGBM
Received date: 2022-09-08
Revised date: 2022-12-06
Online published: 2023-04-19
Supported by
Strategic Priority Research Program of the Chinese Academic of Science, No.XDA23100502
Construction of University Discipline Alliance of Digital Economy of Fujian Province, No.Min Jiao Gao(2022)15.
随着互联网租赁自行车(共享单车)的兴起,“共享单车+地铁”“共享单车+公交”已成为城市通勤的主要接驳方式,但共享单车的“潮汐效应”也成为共享单车管理和资源调配的“痛点”和“难点”。因此,发现共享单车的“潮汐规律”,准确预测共享单车停车区(电子围栏)的租还需求,对于共享单车的有序规范发展,优化用车体验和环境等具有重要意义。本文首先基于共享单车订单数据和“电子围栏”空间数据,提出一种识别共享单车潮汐点的时空模型并分析其潮汐性时空特征。该模型将潮汐点定义为短时间内因大量共享单车租或还从而导致无车可租或无车位可停的电子围栏,然后根据电子围栏在某时间段的状态进行分类,并赋予不同的缺车/缺停指数。结果显示该模型能够精准识别特定时段出现的潮汐点。随后,基于共享单车订单、城市信息点(POI)、道路、人口、土地利用、气温、风速等时空数据,并考虑局部范围内的电子围栏相关性,构建KNN-LightGBM模型来预测共享单车租还需求:① 利用主成分分析(Principal Component Analysis,PCA)进行特征提取;② 利用KNN(K Nearest Neighbors)算法计算局部范围内电子围栏之间相关信息;③ 整合PCA提取的特征向量和电子围栏相关信息作为输入特征,利用LightGBM方法进行租还需求预测;④ 评估影响租还需求预测的特征重要性。结果表明:与常用的4种机器学习方法进行对比,KNN-LightGBM在不同时间尺度下的预测实验中RMSE、MAE的平均值均最小,R2和r平均值均最大,预测效果较好;利用KNN计算局部范围内的电子围栏相关性,能够有效的提高预测精度,与LightGBM相比,KNN-LightGBM的RMSE和MAE分别降低了10%和11%,R2和r分别提高了3%和4%;共享单车的历史订单数据对租还需求预测最为重要,与最近公共交通接驳站距离的重要性次之。
柯日宏 , 吴升 , 柯玮文 . 一种识别共享单车潮汐点的时空模型和基于KNN-LightGBM的租还需求预测方法[J]. 地球信息科学学报, 2023 , 25(4) : 741 -753 . DOI: 10.12082/dqxxkx.2023.220673
With the rise of bicycle sharing network, "shared-bicycle + subway" and "shared-bicycle + bus" have become the main mode of urban commuting, but the "tidal effect" of shared-bicycle makes it difficult to manage and deploy resources. Therefore, exploring the "tidal law" of shared-bicycle and accurately predicting the demand for borrowing and returning bicycles at parking areas (electronic fences) are important for the orderly and standardized development of shared-bicycle and the optimization of the riding experience and environment. Based on the spatial data of shared-bicycle orders and electronic fences, our research proposes a spatial-temporal model for identifying tidal shared-bicycle stops and analyzing their tidal spatial-temporal characteristics. Our model defines the tidal shared-bicycle stops as electric fences with lacking-bike/lacking-parking due to a large number of shared-bicycles borrowed/returned for a short time. The electric fences are then classified according to their status at a certain period and assigned different lacking-bike/lacking-parking indexes. The results show that our spatial-temporal model can accurately identify the tidal shared-bicycle stops at a specific period. Moreover, based on the spatial-temporal data such as shared bicycle orders, city information points (POI), road, population, land-use type, temperature, and wind speed, and considering the correlation of electronic fences at the local area, we propose a K Nearest Neighbors (KNN)-LightGBM model to predict the sharing demand of shared bicycles, which includes: (1) Principal Component Analysis (PCA) is used to extract characteristics; (2) The KNN algorithm is used to calculate the correlation information of electronic fences at the local area; (3) We integrate the characteristic vectors extracted by PCA and the correlation information of electronic fences as input, and use the LightGBM model to predict the sharing demand of bicycles; (4) We evaluate the importance of the characteristics that affect the sharing demand. The results show that the proposed KNN-LightGBM is better than the common machine learning methods in demand prediction at different time scales. The mean values of RMSE and MAE using our proposed model are the smallest and the mean values of R2 and r are the largest. We use the KNN algorithm to calculate the correlation of electronic fences, which can effectively improve the prediction accuracy. Compared with LightGBM, the RMSE and MAE of KNN-LightGBM are reduced by 10% and 11%, respectively, and R2 and r are improved by 3% and 4%, respectively. Based on the importance assessment of characteristics, the historical data of shared-bicycle orders are the most important for the demand prediction, followed by the distance to the nearest public transportation stations. Our study demonstrates the potential of model.
表1 电子围栏状态与缺车/缺停指数Tab. 1 Electronic fence status and lacking-bike / lacking-parking index |
电子围栏状态 | 缺车指数 | 缺停指数 |
---|---|---|
没有共享单车 | 2 | 0 |
只有少量共享单车 | 1 | 0 |
有充足共享单车和停车位 | 0 | 0 |
只有少量停车位 | 0 | 1 |
轻度堆积 | 0 | 2 |
重度堆积 | 0 | 3 |
表2 数据及来源Tab. 2 Data and sources |
数据 | 时间 | 来源 |
---|---|---|
共享单车订单 | 2020年12月21—25日6:00—10:00 | 2021数字中国创新大赛赛题数据 |
电子围栏 | 2020年12月 | 2021数字中国创新大赛赛题数据 |
路网 | 2021年6月 | OpenStreetMap(openstreetmap.org) |
POI | 2021年6月 | 高德地图API数据开放接口(lbs.amap.com) |
人口 | 2021年5月 | 红黑人口库(hongheiku.com) |
土地利用 | 2020年 | 地理科学数据网(csdn.store) |
天气 | 2020年12月 | 美国国家海洋和大气管理局开放数据(www1.ncdc.noaa.gov/pub/data/noaa) |
表3 单个订单轨迹数据示例Tab. 3 Example of order data |
订单_ID | 定位时间 | 纬度/°N | 经度/°E |
---|---|---|---|
sbo000001 | 2020/12/21 6:00:12 | 24.521 046 82 | 118.161 503 7 |
sbo000001 | 2020/12/21 6:00:27 | 24.518 092 38 | 118.163 777 4 |
... | ... | ... | ... |
sbo000001 | 2020/12/21 6:26:08 | 24.479 891 29 | 118.186 703 0 |
表4 电子围栏数据示例Tab. 4 Example of electronic fence data |
电子围栏ID | 中心点坐标 | 面积/m2 | 停车位/个 | |
---|---|---|---|---|
纬度/°N | 经度/°E | |||
故宫路0_R_2 | 24.462 412 61 | 118.079 007 88 | 5.9 | 4 |
观日路(望海路至会展路段 )_R_1 | 24.488 102 32 | 118.181 194 59 | 8.0 | 5 |
... | ... | ... | ... | ... |
安岭路_L_9_B | 24.534 229 23 | 118.151 338 23 | 9.1 | 6 |
表5 电子围栏各时间段的净流量Tab. 5 Net flow of electronic fence in each time period |
电子围栏ID | 2020-12-21 6:00:00—7:00:00 | 2020-12-21 7:00:00—8:00:00 | ... | 2020-12-25 9:00:00—10:00:00 |
---|---|---|---|---|
双浦0_R_A20001 | 6 | -2 | ... | -2 |
枋湖北二路0_L_A21001 | 5 | -1 | ... | -1 |
... | ... | ... | ... | ... |
云顶中路0_L_A03003 | 8 | 1 | ... | -2 |
表6 特征说明Tab. 6 Description of input characteristics |
特征 | 描述 |
---|---|
时间特征 | 电子围栏2020年12月21—24日6:00—10:00的租/还车数量/个 |
空间特征 | 电子围栏200 m范围各类型POI(政府机构、住宅、企业、餐饮服务、金融服务、生活服务、体育休闲服务)的数量/个 |
电子围栏与最近风景名胜、医院、学校、商场、公交车站、地铁站、空中自行车道出入口等的距离/m) | |
电子围栏所在道路等级 | |
电子围栏所在街道(行政单元)的人口密度/(人/m2) | |
电子围栏所在区域的土地利用类型 | |
天气特征 | 天气情况(晴/多云/阴/雨) |
气温/℃ | |
风速/(m/s) | |
能见度/m |
[1] |
中华人民共和国交通运输部. 关于鼓励和规范互联网租赁自行车发展的指导意见[EB/OL]. (2017-08-03)[2022-11-25]. https://xxgk.mot.gov.cn/2020/jigou/ysfws/202006/t20200623_3315417.html.
[MOT. The guidance on encouraging and regulating the development docklessshared-bicycle[EB/OL]. (2017-08-03)[2022-11-25]. https://xxgk.mot.gov.cn/2020/jigou/ysfws/202006/t20200623_3315417.html.
|
[2] |
于二泽, 周继彪. 基于空间滞后模型的公共自行车出行特征及影响因素分析[J]. 交通信息与安全, 2021, 39(1):103-110.
[
|
[3] |
高楹, 宋辞, 郭思慧, 等. 接驳地铁站的共享单车源汇时空特征及其影响因素[J]. 地球信息科学学报, 2021, 23(1):155-170.
[
|
[4] |
陈红, 陈恒瑞, 史转转, 等. 公共自行车使用时空特性挖掘及租还需求预测[J]. 交通运输系统工程与信息, 2021, 21(2):238-244,250.
[
|
[5] |
姜晓, 白璐斌, 楼夏寅, 等. 基于多尺度时空聚类的共享单车潮汐特征挖掘与需求预测研究[J]. 地球信息科学学报, 2022, 24(6):1047-1060.
[
|
[6] |
徐伟. 基于机器学习的共享单车热点区域识别及需求预测[J]. 综合运输, 2019, 41(5):29-34.
[
|
[7] |
|
[8] |
|
[9] |
|
[10] |
林燕平, 窦万峰. 基于ARIMA模型的城市公共自行车需求量短期预测方法研究[J]. 南京师范大学学报(工程技术版), 2016, 16(3):36-40.
[
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
种颖珊, 韩晓明. 基于随机森林与时空聚类的共享单车站点需求量预测[J]. 科学技术与工程, 2018, 18(32):89-94.
[
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
李靖华, 郭耀煌. 主成分分析用于多指标评价的方法研究——主成分评价[J]. 管理工程学报, 2002, 16(1):39-43,3.
[
|
[21] |
|
[22] |
|
[23] |
|
/
〈 | 〉 |