疟疾预测的遗传规划方法与应用——以安徽省县(市)疟疾发病率为例
作者简介:宋泳泽(1988-),男,河北承德人,硕士生,研究方向为遥感信息处理与应用。E-mail: songyz@lreis.ac.cn
收稿日期: 2014-11-30
要求修回日期: 2015-02-05
网络出版日期: 2015-08-05
基金资助
国家重点基础研究发展计划项目(“973”计划)(2012CB955503)
国家科技支撑计划项目课题“贫困地区资源环境监测评估与生态价值评价技术”(2012BAH33B01)
国家科技支撑计划项目课题“流动人口动态监测与信息获取关键技术研究”(2012BAI32B06)
Application of Genetic Programming on Predicting and Mapping Malaria in Anhui Province
Received date: 2014-11-30
Request revised date: 2015-02-05
Online published: 2015-08-05
Copyright
疟疾是世界上最严重的一种寄生虫疾病,安徽省是典型的中纬度疟疾高发区域之一。本文以安徽省县级行政单元统计的疟疾发病率为例,从遥感监测数据中获取疟疾潜在驱动因素的数据,使用遗传规划方法建立遥感监测的环境因素与疟疾发病率之间的关系,从而预测疟疾发病率的空间分布,并分析预测结果、评价模型精度。结果表明,遗传规划方法预测的疟疾发病的精度(训练数据的预测R2 = 0.558,检验数据R2 = 0.429)较线性逐步回归方法的预测精度(训练数据的预测R2 = 0.470,检验数据R2 = 0.408)有所提高。遗传规划方法有利于提高预测疟疾发病率空间分布的精度。其为使用遥感监测数据预测疟疾的空间分布和变化的科学研究提供依据。
宋泳泽 , 葛咏 , 彭军还 , 王劲峰 , 任周鹏 , 廖一兰 . 疟疾预测的遗传规划方法与应用——以安徽省县(市)疟疾发病率为例[J]. 地球信息科学学报, 2015 , 17(8) : 954 -962 . DOI: 10.3724/SP.J.1047.2015.00954
This paper delineates the relationship between remote sensing monitoring indexes and malaria incidences using genetic programming (GP) method based on factors derived from remote sensing data. Thus, the spatial distribution of malaria incidence is predicted, the prediction results are analyzed, and the modeling precision is evaluated. Malaria is considered to be the severest parasite disease and Anhui Province is one of the typical mid-latitude areas coping with high malaria risk. This paper studies the issue of predicting malaria spatial distribution using GP method, as GP is a striking optimization method which has the capability of exploring a proper solution for sophisticated issues through evolutionary algorithms. And this process is further explained with an example adopting the monthly average malaria incidences in each county of Anhui Province from 2004 to 2010. Also, remote sensing data is regarded to be the main source of factors, considering its large spatial scale and fast data acquisition, and that various meteorological and environmental indexes, could be converted from remote sensing data. These factors include remote sensing indexes, such as normalized difference vegetation index (NDVI) and land surface temperature (LST), plus natural attribute (elevation) and social attributes (population, immigrant and GDP data) in the county level. Results demonstrate that NDVI and LST have influences of two months’ and one month’s lag respectively. Compared with the result of linear regression (R2 = 0.470 for training data and R2 = 0.408 for test data), the predicting precision is improved using GP method (R2 = 0.558 for training data and R2 = 0.429 for test data), which is benefited from illustrating the non-linear relation between remote sensing indexes and malaria incidences. GP method contributes to increase the precision of predicting the spatial distribution of malaria incidence. Conclusively, this paper provides a basis for future scientific research on predicting spatial distribution and mapping malaria using remote sensing data.
Key words: genetic programming; malaria; remote sensing data; spatial analysis; prediction
Fig. 1 Flowchart of GP-based malaria prediction model图1 基于遗传规划的疟疾预测模型流程图 |
Fig. 2 Flowchart of GP method图2 遗传规划流程图 |
Fig. 3 Geographical condition of the study area and spatial distribution of annual average malaria incidences in each county图3 研究区域地理概况及各县年平均疟疾发病率空间分布图 |
Fig. 4 Monthly malaria incidences of each county in Anhui Province from 2004 to 2010图4 2004-2010年间安徽省各县月平均疟疾发病率 |
Fig. 5 Spatial distributions and monthly statistics of NDVI and LST图5 NDVI和地表温度空间分布及逐月统计图 |
Fig. 6 Spatial distributions of auxiliary data图6 辅助因素的空间分布 |
Tab. 1 Spearman correlation coefficients between monthly average malaria incidences and remote sensing indexes表1 月平均疟疾发病率与遥感监测指标的Spearman相关系数表 |
滞后期(lag) | NDVI | 平均LST | N | |
---|---|---|---|---|
0 | 0.129** | 0.293** | 936 | |
1 | 0.210** | 0.333** | 936 | |
2 | 0.229** | 0.279** | 936 |
注:**表示在置信度为0.01时相关性显著 |
Tab. 2 Spearman correlation coefficients between annual average malaria incidences and auxiliary factors in each county表2 各县年平均疟疾发病率与辅助因素数据的Spearman相关系数 |
辅助变量 | 相关系数 | P | N |
---|---|---|---|
Ln(高程) | 0.041 | 0.723 | 78 |
Ln(人口) | 0.615** | 0.000 | 78 |
Ln(迁入人口) | 0.202 | 0.076 | 78 |
Ln(GDP) | -0.328** | 0.003 | 78 |
注:**表示在置信度为0.01时相关性显著 |
Tab. 3 Statistics of variables in GP prediction表3 遗传规划预测中的变量统计 |
变量 | 描述 | 最小值 | 平均值 | 中位数 | 最大值 |
---|---|---|---|---|---|
X1 | Ln(人口) | 11.472 | 13.466 | 13.509 | 14.552 |
X2 | Ln(GDP) | 8.276 | 9.593 | 9.527 | 11.074 |
X3 | NDVI(lag=2) | 0.202 | 0.555 | 0.547 | 0.847 |
X4 | 平均LST(lag=1) | -0.678 | 15.706 | 17.007 | 28.250 |
Fig. 7 Spatial distributions of the counties stem from randomly selected training data and test data图7 随机选取的训练数据和检验数据对应的县级行政单元空间分布 |
Fig. 8 GP fitness varying process and the comparison between precision and complexity图8 遗传规划适应度变化图及精度与复杂度对比图 |
Tab. 4 Parameter settings for GP表4 遗传规划参数设置 |
参数 | 参数描述和设置 |
---|---|
终端元素集 | 变量X1,X2,X3,X4 |
函数元素集 | +,-,×,/,power,log,exp,sqrt |
群体大小 | 200个个体 |
代数 | 100 |
适应度函数形式 | 绝对误差和(SAD) |
遗传算子 | 交叉、突变 |
初始化概率 | [0.85,0.15] |
算子的概率 | 动态变化的概率 |
树形结构的深度 | 动态深度选择 |
动态最大深度 | 15 |
结果中的最大深度 | 17 |
随机选择方法 | Lexictour |
存活方式 | Totalelitism (elistism) |
Fig. 9 Comparison between GP-based predicted malaria incidences and original data图9 基于遗传规划的疟疾发病率数据预测结果与原始数据对比图 |
Fig. 10 Mapping of GP-based predicted results图10 遗传规划预测结果图 |
Tab. 5 Prediction errors of GP-based model and linear stepwise regression method表5 GP和线性逐步回归预测误差对比 |
方法 | 训练数据(660组)的结果 | 检验数据(276组)的结果 | |||
---|---|---|---|---|---|
ARE(%) | R2 | ARE(%) | R2 | ||
GP | 13.335 | 0.558 | 17.365 | 0.429 | |
线性逐步回归 | 28.785 | 0.470 | 29.739 | 0.408 |
注:线性逐步回归的回归方程为 |
The authors have declared that no competing interests exist.
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
|
[26] |
|
[27] |
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
|
/
〈 | 〉 |