Journal of Geo-information Science >
Data Quality Inspection Method for Comprehensive Risk Survey of Natural Disasters
Received date: 2023-04-30
Revised date: 2023-06-24
Online published: 2023-09-05
Supported by
Fundamental Research Funds for the Central Universities(ZY20180101)
National Pilot Survey Data Quality Inspection and Verification of the Emergency Management System(O7M79890)
Construction Project of China Knowledge Centre for Engineering Sciences and Technology(CKCEST-2022-1-41)
China is one of the countries that are most vulnerable to natural disasters. To enhance the comprehensive capacity for disaster prevention, the Chinese State Council initiated the first national comprehensive survey of natural disaster risks in 2020 (hereinafter referred to as the “disaster survey”). Data quality inspection and control of the disaster survey is fundamental to ensure data sharing, disaster assessment, emergency response, and even international cooperation in disaster prevention and reduction. This extensive disaster survey was completed by multiple departments, among which the emergency management system is responsible for three investigation tasks: vulnerability investigation of affected areas, historical disaster investigation, and comprehensive disaster reduction capacity investigation. According to the requirements of the quality inspection of disaster survey data, this study integrated the concept of geographical big data and explored a framework for quality inspection of natural disaster survey data. Specifically, our objectives include: 1) designing a business process framework for data quality inspection of the emergency management system at four levels, i.e., national, provincial, municipal, and county levels; 2) developing a technical rule system for data quality inspection following the principles of integrity, standardization, consistency, and rationality; 3) building a new rule database with 11 coding numbers, which can be updated and used by computer software systems; 4) developing a combination (spatial and non-spatial) outlier detection method combining classical mathematical statistics (e.g., standard deviation and median) and spatial correlation analysis (e.g., local indicators of spatial association); 5) proposing a new model for cross-detection of outliers using third-party big data; and 6) establishing a special survey data processing mechanism and a software system response mode to continuously evaluate China's national conditions. The feasibility of the above methods was empirically demonstrated by combining quality inspection practices in nationwide and an application practice of a county-level survey in Jiangxi Province. In total, ten types of data anomalies were found in practice, with most common anomalies falling into three categories: suspected violation of rationality, data exceeding the 99% confidence interval, and anomalies verified by third parties. Our research shows that this data quality inspection farmwork allows for disaster survey data quality inspection at early stages so that many errors can be automatically found during data collection, reducing the pressure of later-stage data quality control and thereby improving efficiency and saving costs. This methodology is expected to provide valuable references for ongoing natural disaster risk surveys and related surveys in the future.
WANG Juanle , LI Shuhan , WANG Yujie , DUAN Bowen , ZHOU Jialing . Data Quality Inspection Method for Comprehensive Risk Survey of Natural Disasters[J]. Journal of Geo-information Science, 2023 , 25(9) : 1765 -1773 . DOI: 10.12082/dqxxkx.2023.230239
表1 灾害普查应急管理系统调查对象Tab. 1 Respondents of the disaster survey emergency management system |
承灾体调查对象 | 历史灾害调查对象 | 综合减灾能力调查对象 |
---|---|---|
学校 | 历史年度自然灾害 | 政府灾害管理能力 |
医疗卫生机构 | 重大历史自然灾害(洪涝灾害) | 政府专职和企业专职消防队伍与装备 |
提供住宿的社会服务机构 | 重大历史自然灾害(地震灾害) | 森林消防队伍与装备 |
公共文化场所 | 重大历史自然灾害(台风灾害) | 航空护林站队伍与装备 |
旅游景区 | 重大历史自然灾害(森林火灾) | 地震专业救援队伍与装备 |
星级饭店 | 矿山/隧道行业救援队伍与装备 | |
体育场馆 | 危化/油气行业救援队伍与装备 | |
宗教活动场所 | 海事救援队伍与装备 | |
大型超市-百货店-亿元以上商品交易市场 | 救灾物资储备库(点) | |
县(域)/乡镇基础指标统计 | 应急避难场所 | |
煤矿(企业) | 地质灾害监测与防治工程 | |
金属非金属地下矿山 | 大型企业救援装备和专业救援队伍 | |
金属非金属露天矿山 | 保险和再保险企业综合减灾能力 | |
尾矿库 | 社会组织减灾能力 | |
化工园区 | 乡镇(街道)减灾能力 | |
危险化学品企业 | 社区(行政村)减灾能力 | |
加油加气加氢站 | 家庭减灾能力 |
表2 灾害普查质检要素属性Tab. 2 Quality inspection element attributes in disaster survey |
规则类型(一级类) | 规则类型(二级类) | 说明 |
---|---|---|
完整性 | 完整性 | 包括必填、选填等要求 |
填报重复 | 包括辖区范围内,同个对象多次填报的情况 | |
规范性 | 数据格式规范性 | 包括填报数据类型限制(如,字符型、浮点型等) |
文件格式规范性 | 包括上传文件是否符合格式要求 | |
一致性 | 逻辑一致性 | 包括调查指标、调查表间逻辑关系约束等 |
时间一致性 | 包括填报时间与事实一致性等 | |
属性一致性 | 包括表间指标的一致性等 | |
空间一致性 | 包括填报经纬度是否在上一级行政区范围内、绘制图层是否自相交、同类对象绘制图层是否重叠等 | |
合理性 | 值域合理性 | 包括填报数据是否在值域范围内 |
异常值合理性 | 包括填报数据的离群性 | |
空间集聚合理性 | 包括填报数据在空间分布上的集聚性 |
表3 灾害普查质检规则库示例Tab. 3 Example of disaster survey data quality inspection rule library |
指标名称 | 规则编码 | 质检规则 | 值域 | 规则类型 (一级类) | 规则类型 (二级类) | 采集阶段是否自检 |
---|---|---|---|---|---|---|
万元以上设备台数 | GGSS.A01.09.01 | 必填 | ≥0 | 完整性 | 完整性 | 是 |
GGSS.A01.09.02 | 整型 | 规范性 | 数据格式规范性 | 是 |
表4 灾害普查中的经典统计方法Tab. 4 Classical statistical methods of disaster survey |
统计参数/ 统计分析方法 | 公式 | 公式编号 | 变量说明 | 方法原理 |
---|---|---|---|---|
平均值 | (1) | μ代表调查指标中所有填报值的平均值; n为调查指标中的填报值个数; xi为第i个填报值 | 判断普查数据集中趋势 | |
最大值 | (2) | xmax代表调查指标中最大的填报值; xn代表调查指标中的第n个填报值 | 描述普查数据离散程度 | |
最小值 | (3) | xmin代表调查指标中最小的填报值 | 描述普查数据离散程度 | |
标准差 | (4) | σ代表调查指标中所有填报值的标准差 | 反映普查数据离散程度,具有与普查指标的计量单位相同的量纲 | |
离散系数 | (5) | σ/μ代表调查指标中所有填报值的离散系数 | 测度普查数据离散程度 | |
方差 | (6) | σ2代表调查指标中所有填报值的方差 | 反映普查数据离散程度 | |
中位数 | (7) | M代表调查指标中所有填报值的中位数; xm、xm+1分别代表调查指标的n个填报值中,第m、m+1个填报值 | 测度顺序普查数据的集中趋势,不受极端数据的影响 | |
上下四分位聚集区间 | 上分位数 | (8) | 代表调查指标中所有填报值由小到大排列后第75%的数字; 代表调查指标中所有填报值由小到大排列后第25%的数字;A代表调查指标中所有填报值的四分位聚集区间右端点;B则为左端点 | 异常值通常被定义为位于区间(B,A)以外的值 |
下分位数 | (9) | |||
上边缘线 | (10) | |||
下边缘线 | (11) | |||
3 原则 | (12) | μ-3 代表调查指标中所有填报值的3倍标准差左端点; μ+3 则为右端点 | 超过区间(μ-3 μ+3 )的数据判断为异常 | |
(13) |
[1] |
国务院办公厅. 国务院办公厅关于开展第一次全国自然灾害综合风险普查的通知[EB/OL].(2020-06-08)[2023-03-01]. www.gov.cn.
[General office of the state council. Notice of General Office of the State Council on carrying out the First National Survey on Natural Disaster Comprehensive Risks[EB/OL]. (2020-06-08)[2023-03-01]. www.gov.cn.]
|
[2] |
王银辉. 浅谈统计质量和统计安全[J]. 经济视野, 2014(19):258.
[
|
[3] |
闫爱莲. 浅议统计数据质量的重要性[J]. 河北煤炭, 2009(4):69-70.
[
|
[4] |
国务院第七次全国人口普查领导小组办公室. 第七次全国人口普查方案通知[EB/OL].(2020-07-07)[2023-03-01]. tjj.sm.gov.cn.
[Office of the seventh national census leading group of the state council. Notification of the Seventh National Census Program[EB/OL]. (2020-07-07)[2023-03-01]. tjj.sm.gov.cn.]
|
[5] |
江苏省统计局. 第四次全国经济普查全面质量控制与管理办法[EB/OL].(2019-06-28)[2023-03-01]. www.jinhu.gov.cn.
[Jiangsu provincial bureau of statistics. Total Quality Control and Management method of the fourth National Economic Census[EB/OL]. (2019-06-28)[2023-03-01]. www.jinhu.gov.cn.]
|
[6] |
国务院办公厅. 第二次全国污染源普查质量控制技术指南[EB/OL].(2019-09-11)[2023-03-01]. sthjt.hubei.gov.cn.
[General office of the state council. Technical Guide for quality Control of the second National Pollution Survey[EB/OL]. (2019-09-11)[2023-03-01]. sthjt.hubei.gov.cn.]
|
[7] |
国务院第一次全国地理国情普查领导小组办公室. 地理国情普查质量控制与检验[M]. 北京: 测绘出版社, 2014.
[Office of the state department's first national geographic and national intelligence survey leading group. Quality control and inspection of geographical national census[M]. Beijing: Sino Maps Press, 2014.]
|
[8] |
史文中, 陈鹏飞, 张效康. 地理国情监测可靠性分析[J]. 测绘学报, 2017, 46(10):1620-1626.
[
|
[9] |
王华, 金勇进. 统计数据准确性评估:方法分类及适用性分析[J]. 统计研究, 2009, 26(1):32-39.
[
|
[10] |
刘洪, 黄燕. 基于经典计量模型的统计数据质量评估方法[J]. 统计研究, 2009, 26(3):91-96.
[
|
[11] |
许永洪. 行政记录和政府统计的多视角研究[J]. 统计研究, 2012, 29(4):3-7.
[
|
[12] |
陶然. 周期性普查数据质量评估方法与适用性研究[J]. 统计研究, 2014, 31(8):66-72.
[
|
[13] |
吴婷, 安军, 胡桂华. 人口普查质量评估方法[J]. 中国统计, 2019(10):47-49.
[
|
[14] |
祝君仪. 大数据时代背景下统计数据质量的评估方法及适用性分析[J]. 中国市场, 2015(29):41-42.
[
|
[15] |
耿修林. 普查数据质量的两种检查方法[J]. 中国统计, 2006(6):10-11.
[
|
[16] |
王磊. 计量模型——一种经典数据质量评估方法[J]. 电子制作, 2012(10):153.
[
|
[17] |
|
[18] |
|
[19] |
|
[20] |
王卷乐, 陈沈斌. 地学栅格格网数据质量评价指标与方法[J]. 测绘科学, 2006, 31(5):83-85,82,6.
[
|
[21] |
朱海涌. 环境与灾害监测预报小卫星数据应用评价[J]. 干旱环境监测, 2010, 24(1):39-42.
[
|
[22] |
王晶. 我国宏观经济统计数据质量诊断方法与实证[J]. 统计与决策, 2018, 34(4):34-37.
[
|
[23] |
|
[24] |
鹿明. 基于大数据的污染源普查清查方法学研究[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[
|
[25] |
廖永丰, 吴玮, 杨赛霓, 等. 自然灾害综合风险防范信息服务技术体系构建及展望[J]. 地球信息科学学报, 2022, 24(12):2282-2296.
[
|
[26] |
Statistical Division of the United Nations. Post-enumeration Surveys-operational Guidelines[R]. New York: United Nations Statistics Division, 2010.
|
[27] |
Geographic information - Data quality: ISO 19157: 2013, 2013.
|
[28] |
国家统计局. 国家统计质量保证框架(2021)[EB/OL].(2021-06-18)[2022-08-20]. www.gov.cn.
[National bureau of statistics. National Statistical Quality Assurance Framework (2021)[EB/OL]. (2021-06-18)[2022-08-20]. www.gov.cn.]
|
[29] |
|
/
〈 | 〉 |