地球信息科学学报 ›› 2017, Vol. 19 ›› Issue (12): 1653-1660.doi: 10.3724/SP.J.1047.2017.01653

• 山洪/泥石流灾害监测技术与方法 • 上一篇    下一篇

自然灾害调查数据的多尺度异常检测方法研究及应用

刘业森1(), 张晓蕾2,3, 郭良2,3,*()   

  1. 1. 天津大学 水利工程仿真与安全国家重点实验室,天津 300072
    2. 中国水利水电科学研究院,北京 100038
    3. 水利部防洪抗旱减灾工程技术研究中心,北京 100038
  • 收稿日期:2017-07-10 修回日期:2017-09-06 出版日期:2017-12-25 发布日期:2017-12-25
  • 通讯作者: 郭良 E-mail:yesenl@lreis.ac.cn;guol@iwhr.com
  • 作者简介:

    作者简介:刘业森(1980-),男,博士生,研究方向为遥感、GIS及其在水利行业的应用。E-mail: yesenl@lreis.ac.cn

  • 基金资助:
    中国水科院科研专项(JZ0145B042016、JZ0145C022017);国家自然科学基金项目(51579131)

Study and Application of the Method of Multi-scale Outliers Detection of Natural Disaster Investigation Data

LIU Yesen1(), ZHANG Xiaolei2,3, GUO Liang2,3,*()   

  1. 1. State Key Laboratory of Hydraulic Engineering Simulation and Safety, Tianjin University, Tianjin 300072, China
    2. China Institute of Water Resources and Hydropower Research, Beijing 100038, China
    3. Research Center on Flood & Drought Disaster Reduction of the Ministry of Water Resources, Beijing 100038, China;
  • Received:2017-07-10 Revised:2017-09-06 Online:2017-12-25 Published:2017-12-25
  • Contact: GUO Liang E-mail:yesenl@lreis.ac.cn;guol@iwhr.com

摘要:

大范围自然灾害调查,涉及区域环境差异大,数据获取方式多样,参与人员多,各级汇总成果中存在一些异常调查单元,需要人工判读其合理性,但单纯依靠人工从海量数据中有效识别异常是不现实的。本文设计了一种自然灾害调查数据的多尺度异常检测方法,综合运用离群检测方法和空间数据挖掘算法,分别进行异常值和异常空间分布模式检测,能够从海量调查数据中快速提取各级尺度的异常值和异常调查单元,支撑人工判读工作。将该方法应用于全国山洪灾害调查评价汇总数据的审核中,以全国历史山洪灾害点和防治区乡镇面积审核为例,分别快速提取了县乡两级区划中的山洪灾害点密度异常单元和面积值异常的乡镇单元,通过对检测结果进行分析,发现是填报口径不一致、单位错误、记录重复等原因造成的。最后分析了该方法在大范围自然灾害调查中的适用条件和方法。

关键词: 灾害调查, 山洪灾害, 数据质量, 异常检测, 空间聚类

Abstract:

"Natural disaster" is the phenomenon of the losses of life and property, which is caused by the interaction of human society and natural environment. It’s also the product of the disaster environment, disaster-causing factors and disaster-bearing body. In order to study the processes, mechanisms and impacts of natural disasters as well as the reduction of the losses caused by natural disasters, it is necessary to conduct surveys of basic data and natural disaster events on a large scale of which the authenticity and consistency are much significant for ensuring the reliability and validity of the research results. However, the large number of organizations and investigators participating in the survey and evaluation process, large regional differences and large spatial scale create challenges in data quality control and validating the consistency of data from various survey units. To ensure the correctness and consistency of the data, it is necessary to carry out manual inspection. However, for the massive survey data, it is unrealistic to totally rely on manual work to effectively identify the abnormities. As a result, we design a multi-scale anomaly detection method for natural disaster survey data by using the single-element detection method of outliers based on normal distribution and spatial clustering method of Anselin Local Moran's I to detect the abnormal values and abnormal spatial distribution patterns of the massive survey data. It can effectively extracts the abnormalities and abnormal investigation units at all levels of scale and gains the reasons for abnormal data. It provides the support for the manual checking of survey data. In this paper, taking the project of flash flood disaster investigation and evaluation in mainland of China as an example, this method is used to audit the events of historical flash flood disaster and the areas of the towns which are in the prevention zones. Also, it quickly extract the anomaly units of flash flood disaster point density and township units with exceptional area values. Further analysis found that the reasons for these abnormalities were due to the inconsistency of filling methods, unit errors, and repetition of records and so on. The method resolved the inconsistency in massive amounts of flash flood survey data. This method is an effective approach of checking the quality of various other large-scale disaster datasets. Although the data validation approach used in this study is very effective, there are still some problems, i.e. the outlier checking only considers the outliers between survey units based on the administrative divisions. Regions are not divided according to their economic development and natural conditions. Finally, we analyze the applicable conditions of this method in the large-scale natural disaster investigations.

Key words: disaster investigation, flash flood disasters, data quality, anomaly detection, spatial clustering