基于随机森林模型的“网格-月”尺度武装冲突风险预测及影响因素分析——以中南半岛为例
杜树坤(1998—),男,山东东营人,硕士生,主要从事人文地理学与地理大数据研究。E-mail: a412309072@sina.com |
收稿日期: 2023-03-27
修回日期: 2023-06-02
网络出版日期: 2023-09-22
基金资助
国家自然科学基金项目(41301125)
国家社科基金重大项目(20&ZD138)
Armed Conflict Risk Prediction and Influencing Factors Analysis Based on the Random Forest Model at the Grid-month Scale: A Case Study of Indochina Peninsula
Received date: 2023-03-27
Revised date: 2023-06-02
Online published: 2023-09-22
Supported by
National Natural Science Foundation of China(41301125)
National Social Science Foundation of China(20&ZD138)
掌握周边地区武装冲突风险形势对我国“一带一路”倡议推进和海外投资建设具有十分重要的意义。由于武装冲突风险涉及的因素众多,很多数据时空精度有限,以往研究的尺度大多集中于“国家-年”层面,未能从次国家尺度上预测武装冲突风险。通过将武装冲突与政治、经济、社会和地理等专题的多源数据匹配到统一的“网格-月”尺度的时空框架中,构建了多个基于随机森林模型的武装冲突风险预测模型,以中南半岛为例,对比各专题模型和集成模型的预测精度,将预测结果与实际的武装冲突风险时空分布情况进行比较与分析,计算各影响因素的权重并分析其影响作用。研究结果表明:① 基于随机森林模型的冲突预测模型比传统的逻辑回归模型预测精度更高,其中集成模型的准确率、ROC曲线下面积和PR曲线下面积分别提高了0.017 7、0.436 2和0.171 2;② 中南半岛武装冲突风险受政治、经济和社会要素影响较高,地理要素的相关性较弱,但随着风险水平发生变化,影响因素的作用程度也在改变;③ 在基础专题数据的支撑下,顾及冲突的时空依赖性可以明显提高模型的预测精度;④ 与大尺度研究相比,“网格-月”尺度的冲突预测结果精度更高,可解释性也更强。本研究可为我国海外投资与当地冲突风险防控与治理等提供参考和依据。
杜树坤 , 张晶 , 韩志军 , 公茂玉 . 基于随机森林模型的“网格-月”尺度武装冲突风险预测及影响因素分析——以中南半岛为例[J]. 地球信息科学学报, 2023 , 25(10) : 2026 -2038 . DOI: 10.12082/dqxxkx.2023.230152
A better understanding of the threat of armed conflict in a region is essential to advance the Belt and Road Initiative and overseas investment and construction. Most existing studies have concentrated on a "country-year" level and have limited accuracy in predicting armed conflict risk at the sub-national level, because the armed conflict risk involves numerous influencing factors and many data have limited spatiotemporal accuracy. In our study, we built several models based on random forest methods for armed conflict risk prediction by integrating multi-source armed conflict data with political, economic, social, and geographic thematic information into a unified spatiotemporal framework at "grid-month" scale. Taking the Indochinese Peninsula as an example, we compared the prediction accuracy of each thematic model and the integrated model for armed conflict risk. Then we compared the prediction results against the actual spatiotemporal distribution of armed conflicts, and the weights of each influencing factor were calculated and analyzed. The results show that: (1) compared to the traditional logistic regression model's performance, the accuracy, area under the ROC curve, and area under the PR curve of the integrated random forest model increased by 0.017 7, 0.436 2, and 0.171 2, respectively; (2) the political, economic, and social factors had a significant impact on the risk of armed conflict in the Indochina Peninsula, while geographic factors were less important. However, as the risk level changes, the degree of influence of these factors also changed; (3) the model prediction accuracy of armed conflict risk can be greatly increased by taking into account the spatiotemporal dependence of conflicts, which was supported by the underlying thematic data; (4) the conflict results predicted at the "grid-month" scale were more precise and interpretable compared to large-scale prediction results. This study provides a reference and basis for China's overseas investment as well as local conflict risk prevention, control, and governance.
表1 数据来源Tab. 1 Data sources |
专题要素 | 指标数据 | 指标含义 | 数据来源 |
---|---|---|---|
政治 | 民众抗议 | 反映冲突前的局势紧张程度 | ACLED[24] |
政府与反对群体互动 | GDELT[25] | ||
到首都的距离 | 反映政府对当地的控制能力 | PRIO-GRID[26] | |
到主要城市的距离 | |||
到国界的距离 | |||
军费开支 | 反映政府对国家的整体控制能力 | World Bank[27] | |
经济 | 人均收入 | 反映民众发起叛乱冲突的机会成本 | World Bank[27] |
人均收入增长率 | |||
国内生产总值 | 反映国家整体经济发展水平 | ||
经济增长率 | |||
夜间灯光 | 反映当地经济发展水平 | DMSP[28], EOG[29] | |
社会 | 种族歧视 | 反映族群矛盾 | EPR[30] |
儿童营养不良率 | 反映社会不平等程度 | PRIO-GRID | |
新生儿死亡率 | |||
毒品种植 | 反映社会不稳定程度 | ||
失业率 | World Bank | ||
国家人口数量 | 反映整体上脱离国家的意愿 | ||
国家人口密度 | |||
当地人口规模 | 反映人口导致的资源稀缺情况 | World Pop[31] | |
地理 | 山区地形 | 能够作为反叛组织的避风港 | PRIO-GRID |
森林覆盖 | |||
土地贫瘠 | 反映地区民众发起叛乱冲突的机会成本 | ||
自然资源 | 反映叛乱组织动机程度 | ||
自然灾害 | 增加民众不满情绪,降低机会成本 | SEDAC[32] | |
时空依赖性 | 历史冲突 | 反映冲突的时间依赖性 | ACLED |
邻近冲突 | 反映冲突的空间依赖性 |
表2 预测精度对比Tab. 2 Comparison of prediction accuracy |
模型名称 | 准确率 | AUROC | AUPR |
---|---|---|---|
逻辑回归模型 | 0.955 8±0 | 0.500 0±0 | 0.522 1±0 |
政治专题模型 | 0.969 3±0.000 24 | 0.918 6±0.000 61 | 0.620 8±0.001 04 |
经济专题模型 | 0.969 0±0.000 39 | 0.911 4±0.001 64 | 0.614 1±0.002 83 |
社会专题模型 | 0.967 8±0.000 11 | 0.931 4±0.001 28 | 0.657 4±0.000 57 |
地理专题模型 | 0.963 4±0.000 03 | 0.898 4±0.000 38 | 0.508 3±0.000 30 |
时空依赖性专题模型 | 0.969 3±0.000 23 | 0.918 5±0.000 65 | 0.620 8±0.001 03 |
全要素模型 | 0.972 1±0.000 19 | 0.933 4±0.000 82 | 0.674 4±0.000 70 |
集成模型 | 0.973 5±0.000 17 | 0.936 2±0.000 87 | 0.693 3±0.000 86 |
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
陈冲, 胡竞天. 空间依赖与武装冲突预测[J]. 国际政治科学, 2022, 7(2):86-123.
[
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
GDELT. The GDELT Project[EB/OL]. [2023-03-10]. https://www.gdeltproject.org/.
|
[26] |
|
[27] |
|
[28] |
National Geophysical Data Center. Operational Line scan System-data Description[EB/OL]. (2008-09-06) [2023-03-10]. http://www.ngdc.noaa.gov/dmsp/sensors/ols.html.
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
|
[34] |
|
[35] |
|
[36] |
|
[37] |
|
[38] |
|
[39] |
|
[40] |
|
[41] |
|
[42] |
陈冲. 机会、贪婪、怨恨与国内冲突的再思考——基于时空模型对非洲政治暴力的分析[J]. 世界经济与政治, 2018(8):94-127,158.
[
|
[43] |
|
[44] |
|
[45] |
|
[46] |
|
[47] |
|
[48] |
|
[49] |
张嘉琪, 杜开虎, 任书良, 等. 多源空间大数据场景下的家装品牌线下广告选址[J]. 武汉大学学报·信息科学版, 2022, 47(9):1406-1415.
[
|
[50] |
刘坚, 李树林, 陈涛. 基于优化随机森林模型的滑坡易发性评价[J]. 武汉大学学报·信息科学版, 2018, 43(7):1085-1091.
[
|
[51] |
|
[52] |
王运生, 谢丙炎, 万方浩, 等. ROC曲线分析在评价入侵物种分布模型中的应用[J]. 生物多样性, 2007(4):365-372.
[
|
[53] |
|
[54] |
|
/
〈 |
|
〉 |