地球信息科学学报 ›› 2023, Vol. 25 ›› Issue (1): 163-176.doi: 10.12082/dqxxkx.2023.220468

• 地球信息科学理论与方法 • 上一篇    下一篇

基于SMOTE-RF算法的村庄发展类型识别方法研究

潘雨飘1(), 赵翔1,*(), 王静1,2, 张亦清1, 刘耀林1   

  1. 1.武汉大学资源与环境科学学院,武汉 430079
    2.北京师范大学水科学研究院,北京 100875
  • 收稿日期:2022-07-02 修回日期:2022-10-08 出版日期:2023-01-25 发布日期:2023-03-25
  • 通讯作者: *赵翔(1985— ),男,湖南新邵人,博士,副教授,主要从事国土空间规划和智能空间优化决策研究。 E-mail: zhaoxiang@whu.edu.cn
  • 作者简介:潘雨飘(1999— ),女,贵州三穗人,硕士生,主要从事机器学习和国土空间规划决策支持研究。E-mail: yupiaopan@whu.edu.cn
  • 基金资助:
    国家自然科学基金项目(41971336);国家重点研发计划项目(2018YFD1100801)

Identifying the Class of the Villages based on SMOTE-RF Algorithm

PAN Yupiao1(), ZHAO Xiang1,*(), WANG Jing1,2, ZHANG Yiqing1, LIU Yaolin1   

  1. 1. School of Resources and Environmental Sciences, Wuhan University, Wuhan 430079, China
    2. College of Water Sciences, Beijing Normal University, Beijing 100875, China
  • Received:2022-07-02 Revised:2022-10-08 Online:2023-01-25 Published:2023-03-25
  • Contact: ZHAO Xiang
  • Supported by:
    National Natural Science Foundation of China(41971336);National Key Research and Development Program of China(2018YFD1100801)

摘要:

准确把握区域发展规律,定量、客观地认识村庄发展类型,对“因地制宜、分类推进”乡村振兴具有非常重要的现实意义。针对区域村庄发展类型自动、准确识别问题,研究提出了一种基于SMOTE-RF算法的村庄发展类型识别模型。研究首先从地形、区位、社会经济、农业生产和生态环境等方面提出了面向村庄发展多维特征表达的指标体系。在此基础上,针对村庄样本不平衡分布特点,利用SMOTE过采样技术对少数类样本进行分析和模拟,合成平衡化的村庄分类样本集;进而利用随机森林算法自动构建村庄发展的多维属性特征与村庄类型之间的非线性关系,形成可用于区域村庄发展类型自动识别的智能分类器。为验证模型的有效性,研究选取山东招远市作为试验区开展了实证研究。实验结果表明,耦合SMOTE过采样技术的随机森林分类模型有效保障了村庄分类结果的可靠性和准确度。在试验区,模型自动识别结果与规划专家分类结果的一致性达88.27%,Kappa系数为0.78,整体一致性良好。相对于人工分类,基于SMOTE-RF方法的村庄类型自动识别方法减少了依赖人工经验分类带来的不确定性,保障了分类结果的一致性,能够为国土空间规划和乡村振兴专项规划决策提供可靠的决策依据。

关键词: 村庄分类, 随机森林, SMOTE方法, 多源数据, 过采样, 乡村振兴, 国土空间规划, 招远市

Abstract:

To achieve sustainable development and revitalization of the rural areas, it is significant to identify the development pattern of villages according to their natural, social, and economic conditions. To accurately identify the development pattern of villages in rural areas, this study aims to develop a village classification method based on the SMOTE-RF algorithm. To achieve this goal, first, we designed a multi-dimensional index system that includes aspects of topography, location, socioeconomics, agricultural production, construction lands, ecosystem services, and characteristics of rural settlements, to quantify and assess the development characteristics of villages. Second, the classification information of villages identified by planning experts were collected as a sample dataset for model training and validation. To address the overfitting issues of classification algorithms caused by imbalanced sample sets, an oversampling algorithm called SMOTE was applied to produce a balanced synthetic sample set from the original sample set obtained by planning experts based on the K-nearest neighbor strategy. Third, the balanced sample set produced by SMOTE algorithm was used to train the classifier for village classification. Then, the nonlinear relationship between the multi-dimensional development characteristics of the villages and the development pattern of villages was identified using the Random Forest (RF) algorithm. Finally, Zhaoyuan city, which is located in Shandong Province, China, was selected as the study area to evaluate the performance of our model. The experimental results show that the classification model we built based on the SOMTE-RF algorithm can automatically extract the multi-dimensional and nonlinear expert knowledge for village classification from a small number of samples. Compared with the unsupervised classification methods such as SOFM algorithm, the classification results produced by our model can better support the spatial planning decision-making, because the SMOTE-RF algorithm can intuitively present the classification rules in a tree structure. In addition, with the application of oversampling algorithm, the overall accuracy, the accuracy, and the AUC value of the classification model were increased from 0.93 to 0.99, 0.73 to 0.88, and 0.895 to 0.982, respectively, compare with the model results without oversampling. The village classification results in Zhaoyuan also demonstrated that the results obtained by SMOTE-RF algorithm were overall consistent with that of planning experts. For instance, the consistency between the results classified by our model and the planning experts reached 88.27%, and the Kappa coefficient was about 0.78. The village classification model we developed in this study can significantly reduce the uncertainty of the classification results, thus providing a reliable decision-making basis for the territorial planning and rural revitalization.

Key words: village classification, random forest, SMOTE method, multi-source data, oversampling, rural revitalization, territorial spatial planning, Zhaoyuan city