地球信息科学学报 ›› 2017, Vol. 19 ›› Issue (5): 605-612.doi: 10.3724/SP.J.1047.2017.00605

• 地球信息科学理论与方法 • 上一篇    下一篇

基于空间自相关和概率论的土壤重金属异常值的识别方法

王景云1(), 杨军2,*(), 杨俊兴2, 雷梅2, 万小铭2, 周小勇2, 陈同斌2, 张红日1, 赵相伟1   

  1. 1. 山东科技大学测绘科学与工程学院,青岛 266590
    2. 中国科学院地理科学与资源研究所 环境修复研究中心,北京 100101
  • 收稿日期:2016-09-23 修回日期:2016-11-01 出版日期:2017-05-27 发布日期:2017-05-20
  • 作者简介:作者简介:王景云(1990-),男,山东临沂人,硕士生,研究方向为地理环境演化与分析。E-mail:wjy603@126.com
  • 基金资助:
    国家自然基金面上项目“同位素标识蜈蚣草对Pb污染土壤修复的调控机理探索”(41271478);国家“863”课题“土壤重金属污染现场监测技术与设备”(2014AA06A513)

A Method for Detecting Outliers of Soil Heavy Metal Data Based on Spatial Autocorrelation and Probability Theory

WANG Jingyun1(), YANG Jun2,*(), YANG Junxing2, LEI Mei2, WAN Xiaoming2, ZHOU Xiaoyong2, CHEN Tongbin2, ZHANG Hongri1, ZHAO Xiangwei1   

  1. 1. College of Geomatics, Shandong University of Science and Technology, Qingdao 266590, China
    2. Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
  • Received:2016-09-23 Revised:2016-11-01 Online:2017-05-27 Published:2017-05-20
  • Contact: YANG Jun

摘要:

数据是开展土壤环境质量研究的基础,在实验过程中,由于系统误差和人为误差导致数据异常、降低数据质量,进而对污染评价、修复与管理决策等后续工作带来误判。目前对于此方面缺乏深入的研究和探讨。基于此,本研究提出一种甄别土壤重金属异常数据的方法,并以北京市土壤Cd含量数据为例,对该方法的有效性进行了验证。结果显示,北京市651个土壤Cd数据中有34个数据异常,对甄别出的异常数据进行化学复测,发现76.5%的Cd异常数据(26个)为系统误差和人为误差导致;20.6%的异常数据(7个)为客观存在的异常点。将原始数据修正后,插值预测精度得到了显著提高。Cd异常数据自身的平均相对误差下降44.56%,均方根误差降低33.33%;受异常值影响的邻近点平均相对误差下降20.59%,均方根误差降低17.33%。结果表明本方法可以有效识别出土壤重金属数据中的异常数据,在增加有限样本量和分析时间的前提下提高调查数据质量,为开展区域土壤调查,保障数据质量提供有效的工具。

关键词: 土壤重金属, 异常数据, 校验方法, 交叉验证, 预测精度

Abstract:

Data was the basis of carrying out the research on environmental quality of the soil. However, in the experimental process, the systematic errors and artificial errors may lead to some outliers, which may reduce the data quality and cause erroneous judgement for pollution assessment and management decision. At present, there was a lack of thorough study and exploration in this respect. Based on this, a method for detecting outliers of soil heavy metal data was put forward in this study. The soil Cd concentration of Beijing in China was taken as an example to verify the validity of the method. The results show that there are 34 outliers for Cd concentration in Beijing. The detected outliers in Beijing were re-analysed. The results showed that 76.5% of the outliers were found to be caused by the systematic errors and artificial errors and 20.6% of the outliers existed, objectively. After the correction, the interpolation accuracy was improved significantly. The mean relative error and mean square error of the outliers were reduced by 44.56% and 33.33%, respectively. Also, the mean relative error and mean square error of the nearest neighboring points which are influenced by the outliers were reduced by 20.59% and 17.33%, respectively. Results indicated that the outliers of soil heavy metal could be effectively detected by the proposed method. Under the premise of adding finite sample size and analysis time, the quality of the survey data was improved and an effective tool was provided to carry out soil investigation at regional scale and guarantee the data quality.

Key words: soil heavy metal, outlier data, check method, cross-validation, prediction accuracy