地球信息科学学报 ›› 2023, Vol. 25 ›› Issue (9): 1765-1773.doi: 10.12082/dqxxkx.2023.230239

• 地球信息科学理论与方法 • 上一篇    下一篇

自然灾害综合风险普查中的质量检查方法研究

王卷乐1,4,*(), 李姝晗1,2, 王玉洁1, 段博文1, 周佳玲1,3   

  1. 1.中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
    2.防灾科技学院, 三河 065201
    3.江苏海洋大学 海洋技术与测绘学院,连云港 222005
    4.江苏省地理信息资源开发与利用协同创新中心,南京 210023
  • 收稿日期:2023-04-30 修回日期:2023-06-24 出版日期:2023-09-25 发布日期:2023-09-05
  • 作者简介:王卷乐(1976—),男,河南洛阳人,研究员,博士生导师,主要从事资源环境数据集成与共享研究。 E-mail: wangjl@igsnrr.ac.cn
  • 基金资助:
    中央高校基本科研业务费项目(ZY20180101);应急管理系统试点调查数据国家质检核查(O7M79890);中国工程科技知识中心建设项目(CKCEST-2022-1-41)

Data Quality Inspection Method for Comprehensive Risk Survey of Natural Disasters

WANG Juanle1,4,*(), LI Shuhan1,2, WANG Yujie1, DUAN Bowen1, ZHOU Jialing1,3   

  1. 1. State Key Laboratory of Resources and Environmental Information Systems, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
    2. Institute of Disaster Prevention, Sanhe 065201, China
    3. School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang 222005, China
    4. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
  • Received:2023-04-30 Revised:2023-06-24 Online:2023-09-25 Published:2023-09-05
  • Contact: * WANG Juanle, E-mail: wangjl@igsnrr.ac.cn
  • Supported by:
    Fundamental Research Funds for the Central Universities(ZY20180101);National Pilot Survey Data Quality Inspection and Verification of the Emergency Management System(O7M79890);Construction Project of China Knowledge Centre for Engineering Sciences and Technology(CKCEST-2022-1-41)

摘要:

中国是最易受自然灾害影响的国家之一,为提升自然灾害综合防范能力,国务院于2020年启动了第一次全国自然灾害综合风险普查(以下简称灾害普查)。灾害普查数据质量的检查和控制,是保证普查数据共享利用、灾害评估、应急救灾,甚至防灾减灾国际合作的根本。灾害普查任务内容广泛,由多个部门分工完成,其中我国应急管理系统承担着承灾体调查、历史灾害调查、综合减灾能力3项调查任务。本文面向应急管理系统灾害普查数据质检的需要,深刻融入地理大数据的思想,探索自然灾害普查数据质检方法体系。具体包括:① 设计了国家、省、市、县4个层级的应急管理体系质检业务流程框架;② 建立了完整性、规范性、一致性、合理性四位一体的质检技术规则体系;③ 构建了由11位编码组成的新型质检规则编码体系,形成计算机软件系统可更新和调用的质检规则库;④ 提出了标准差、中位数等经典统计与空间相关性分析相结合的属性与空间组合质检方法;⑤ 探索了基于第三方大数据等异常值交叉检测的新模式;⑥ 建立了适应我国国情的特殊普查数据处理机制和软件系统响应方式。结合国家试点县质检实践和江西省某县级普查应用实践,对上述方法的可用性进行了实证。实践共发现十类数据异常,其中疑似违反合理性、超出99%可信区间与第三方验证异常等3类异常类型包含的异常数据最多。研究表明该质检方法体系能够使得灾害普查质检端口前移,可以在前端采集时自动发现许多错误,减少后期质量控制的压力,也因此提高质检效率、提升普查数据质量。这一方法预期能够为今后持续开展的灾害普查和相关普查提供质检方法借鉴。

关键词: 自然灾害普查, 应急管理系统, 普查数据, 数据质量控制, 质检方法, 异常值检测, 地理大数据, 灾害数据

Abstract:

China is one of the countries that are most vulnerable to natural disasters. To enhance the comprehensive capacity for disaster prevention, the Chinese State Council initiated the first national comprehensive survey of natural disaster risks in 2020 (hereinafter referred to as the “disaster survey”). Data quality inspection and control of the disaster survey is fundamental to ensure data sharing, disaster assessment, emergency response, and even international cooperation in disaster prevention and reduction. This extensive disaster survey was completed by multiple departments, among which the emergency management system is responsible for three investigation tasks: vulnerability investigation of affected areas, historical disaster investigation, and comprehensive disaster reduction capacity investigation. According to the requirements of the quality inspection of disaster survey data, this study integrated the concept of geographical big data and explored a framework for quality inspection of natural disaster survey data. Specifically, our objectives include: 1) designing a business process framework for data quality inspection of the emergency management system at four levels, i.e., national, provincial, municipal, and county levels; 2) developing a technical rule system for data quality inspection following the principles of integrity, standardization, consistency, and rationality; 3) building a new rule database with 11 coding numbers, which can be updated and used by computer software systems; 4) developing a combination (spatial and non-spatial) outlier detection method combining classical mathematical statistics (e.g., standard deviation and median) and spatial correlation analysis (e.g., local indicators of spatial association); 5) proposing a new model for cross-detection of outliers using third-party big data; and 6) establishing a special survey data processing mechanism and a software system response mode to continuously evaluate China's national conditions. The feasibility of the above methods was empirically demonstrated by combining quality inspection practices in nationwide and an application practice of a county-level survey in Jiangxi Province. In total, ten types of data anomalies were found in practice, with most common anomalies falling into three categories: suspected violation of rationality, data exceeding the 99% confidence interval, and anomalies verified by third parties. Our research shows that this data quality inspection farmwork allows for disaster survey data quality inspection at early stages so that many errors can be automatically found during data collection, reducing the pressure of later-stage data quality control and thereby improving efficiency and saving costs. This methodology is expected to provide valuable references for ongoing natural disaster risk surveys and related surveys in the future.

Key words: natural disaster survey, emergency management system, survey data, data quality control, quality inspection method, outlier detection, geographical big data, disaster data