Journal of Geo-information Science >
Application of Genetic Programming on Predicting and Mapping Malaria in Anhui Province
Received date: 2014-11-30
Request revised date: 2015-02-05
Online published: 2015-08-05
Copyright
This paper delineates the relationship between remote sensing monitoring indexes and malaria incidences using genetic programming (GP) method based on factors derived from remote sensing data. Thus, the spatial distribution of malaria incidence is predicted, the prediction results are analyzed, and the modeling precision is evaluated. Malaria is considered to be the severest parasite disease and Anhui Province is one of the typical mid-latitude areas coping with high malaria risk. This paper studies the issue of predicting malaria spatial distribution using GP method, as GP is a striking optimization method which has the capability of exploring a proper solution for sophisticated issues through evolutionary algorithms. And this process is further explained with an example adopting the monthly average malaria incidences in each county of Anhui Province from 2004 to 2010. Also, remote sensing data is regarded to be the main source of factors, considering its large spatial scale and fast data acquisition, and that various meteorological and environmental indexes, could be converted from remote sensing data. These factors include remote sensing indexes, such as normalized difference vegetation index (NDVI) and land surface temperature (LST), plus natural attribute (elevation) and social attributes (population, immigrant and GDP data) in the county level. Results demonstrate that NDVI and LST have influences of two months’ and one month’s lag respectively. Compared with the result of linear regression (R2 = 0.470 for training data and R2 = 0.408 for test data), the predicting precision is improved using GP method (R2 = 0.558 for training data and R2 = 0.429 for test data), which is benefited from illustrating the non-linear relation between remote sensing indexes and malaria incidences. GP method contributes to increase the precision of predicting the spatial distribution of malaria incidence. Conclusively, this paper provides a basis for future scientific research on predicting spatial distribution and mapping malaria using remote sensing data.
Key words: genetic programming; malaria; remote sensing data; spatial analysis; prediction
SONG Yongze , GE Yong , PENG Junhuan , WANG Jinfeng , REN Zhoupeng , LIAO Yilan . Application of Genetic Programming on Predicting and Mapping Malaria in Anhui Province[J]. Journal of Geo-information Science, 2015 , 17(8) : 954 -962 . DOI: 10.3724/SP.J.1047.2015.00954
Fig. 1 Flowchart of GP-based malaria prediction model图1 基于遗传规划的疟疾预测模型流程图 |
Fig. 2 Flowchart of GP method图2 遗传规划流程图 |
Fig. 3 Geographical condition of the study area and spatial distribution of annual average malaria incidences in each county图3 研究区域地理概况及各县年平均疟疾发病率空间分布图 |
Fig. 4 Monthly malaria incidences of each county in Anhui Province from 2004 to 2010图4 2004-2010年间安徽省各县月平均疟疾发病率 |
Fig. 5 Spatial distributions and monthly statistics of NDVI and LST图5 NDVI和地表温度空间分布及逐月统计图 |
Fig. 6 Spatial distributions of auxiliary data图6 辅助因素的空间分布 |
Tab. 1 Spearman correlation coefficients between monthly average malaria incidences and remote sensing indexes表1 月平均疟疾发病率与遥感监测指标的Spearman相关系数表 |
滞后期(lag) | NDVI | 平均LST | N | |
---|---|---|---|---|
0 | 0.129** | 0.293** | 936 | |
1 | 0.210** | 0.333** | 936 | |
2 | 0.229** | 0.279** | 936 |
注:**表示在置信度为0.01时相关性显著 |
Tab. 2 Spearman correlation coefficients between annual average malaria incidences and auxiliary factors in each county表2 各县年平均疟疾发病率与辅助因素数据的Spearman相关系数 |
辅助变量 | 相关系数 | P | N |
---|---|---|---|
Ln(高程) | 0.041 | 0.723 | 78 |
Ln(人口) | 0.615** | 0.000 | 78 |
Ln(迁入人口) | 0.202 | 0.076 | 78 |
Ln(GDP) | -0.328** | 0.003 | 78 |
注:**表示在置信度为0.01时相关性显著 |
Tab. 3 Statistics of variables in GP prediction表3 遗传规划预测中的变量统计 |
变量 | 描述 | 最小值 | 平均值 | 中位数 | 最大值 |
---|---|---|---|---|---|
X1 | Ln(人口) | 11.472 | 13.466 | 13.509 | 14.552 |
X2 | Ln(GDP) | 8.276 | 9.593 | 9.527 | 11.074 |
X3 | NDVI(lag=2) | 0.202 | 0.555 | 0.547 | 0.847 |
X4 | 平均LST(lag=1) | -0.678 | 15.706 | 17.007 | 28.250 |
Fig. 7 Spatial distributions of the counties stem from randomly selected training data and test data图7 随机选取的训练数据和检验数据对应的县级行政单元空间分布 |
Fig. 8 GP fitness varying process and the comparison between precision and complexity图8 遗传规划适应度变化图及精度与复杂度对比图 |
Tab. 4 Parameter settings for GP表4 遗传规划参数设置 |
参数 | 参数描述和设置 |
---|---|
终端元素集 | 变量X1,X2,X3,X4 |
函数元素集 | +,-,×,/,power,log,exp,sqrt |
群体大小 | 200个个体 |
代数 | 100 |
适应度函数形式 | 绝对误差和(SAD) |
遗传算子 | 交叉、突变 |
初始化概率 | [0.85,0.15] |
算子的概率 | 动态变化的概率 |
树形结构的深度 | 动态深度选择 |
动态最大深度 | 15 |
结果中的最大深度 | 17 |
随机选择方法 | Lexictour |
存活方式 | Totalelitism (elistism) |
Fig. 9 Comparison between GP-based predicted malaria incidences and original data图9 基于遗传规划的疟疾发病率数据预测结果与原始数据对比图 |
Fig. 10 Mapping of GP-based predicted results图10 遗传规划预测结果图 |
Tab. 5 Prediction errors of GP-based model and linear stepwise regression method表5 GP和线性逐步回归预测误差对比 |
方法 | 训练数据(660组)的结果 | 检验数据(276组)的结果 | |||
---|---|---|---|---|---|
ARE(%) | R2 | ARE(%) | R2 | ||
GP | 13.335 | 0.558 | 17.365 | 0.429 | |
线性逐步回归 | 28.785 | 0.470 | 29.739 | 0.408 |
注:线性逐步回归的回归方程为 |
The authors have declared that no competing interests exist.
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
|
[26] |
|
[27] |
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
|
/
〈 | 〉 |