
  • 宋泳泽 , 1, 2 ,
  • 葛咏 , 2, 3**, * ,
  • 彭军还 1 ,
  • 王劲峰 2 ,
  • 任周鹏 2 ,
  • 廖一兰 2
  • 1. 中国地质大学(北京)土地科学技术学院,北京 100083
  • 2. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
  • 3. 江苏省地理信息资源开发与利用协同创新中心,南京 210023
*通讯作者:葛 咏(1972-),女,新疆奎屯人,研究员,博士生导师。研究方向为空间数据分析和质量评价。E-mail:


收稿日期: 2014-11-30

  要求修回日期: 2015-02-05

  网络出版日期: 2015-08-05





Application of Genetic Programming on Predicting and Mapping Malaria in Anhui Province

  • SONG Yongze , 1, 2 ,
  • GE Yong , 2, 3, * ,
  • PENG Junhuan 1 ,
  • WANG Jinfeng 2 ,
  • REN Zhoupeng 2 ,
  • LIAO Yilan 2
  • 1. School of Land Science and Technology, China University of Geosciences (Beijing), Beijing 100083, China
  • 2. State Key Lab of Resources and Environmental Information System, Institute of Geographical Sciences and Natural ResourcesResearch, Chinese Academy of Sciences, Beijing 100101, China
  • 3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China;
*Corresponding author: GE Yong, E-mail:

Received date: 2014-11-30

  Request revised date: 2015-02-05

  Online published: 2015-08-05


《地球信息科学学报》编辑部 所有


疟疾是世界上最严重的一种寄生虫疾病,安徽省是典型的中纬度疟疾高发区域之一。本文以安徽省县级行政单元统计的疟疾发病率为例,从遥感监测数据中获取疟疾潜在驱动因素的数据,使用遗传规划方法建立遥感监测的环境因素与疟疾发病率之间的关系,从而预测疟疾发病率的空间分布,并分析预测结果、评价模型精度。结果表明,遗传规划方法预测的疟疾发病的精度(训练数据的预测R2 = 0.558,检验数据R2 = 0.429)较线性逐步回归方法的预测精度(训练数据的预测R2 = 0.470,检验数据R2 = 0.408)有所提高。遗传规划方法有利于提高预测疟疾发病率空间分布的精度。其为使用遥感监测数据预测疟疾的空间分布和变化的科学研究提供依据。


宋泳泽 , 葛咏 , 彭军还 , 王劲峰 , 任周鹏 , 廖一兰 . 疟疾预测的遗传规划方法与应用——以安徽省县(市)疟疾发病率为例[J]. 地球信息科学学报, 2015 , 17(8) : 954 -962 . DOI: 10.3724/SP.J.1047.2015.00954


This paper delineates the relationship between remote sensing monitoring indexes and malaria incidences using genetic programming (GP) method based on factors derived from remote sensing data. Thus, the spatial distribution of malaria incidence is predicted, the prediction results are analyzed, and the modeling precision is evaluated. Malaria is considered to be the severest parasite disease and Anhui Province is one of the typical mid-latitude areas coping with high malaria risk. This paper studies the issue of predicting malaria spatial distribution using GP method, as GP is a striking optimization method which has the capability of exploring a proper solution for sophisticated issues through evolutionary algorithms. And this process is further explained with an example adopting the monthly average malaria incidences in each county of Anhui Province from 2004 to 2010. Also, remote sensing data is regarded to be the main source of factors, considering its large spatial scale and fast data acquisition, and that various meteorological and environmental indexes, could be converted from remote sensing data. These factors include remote sensing indexes, such as normalized difference vegetation index (NDVI) and land surface temperature (LST), plus natural attribute (elevation) and social attributes (population, immigrant and GDP data) in the county level. Results demonstrate that NDVI and LST have influences of two months’ and one month’s lag respectively. Compared with the result of linear regression (R2 = 0.470 for training data and R2 = 0.408 for test data), the predicting precision is improved using GP method (R2 = 0.558 for training data and R2 = 0.429 for test data), which is benefited from illustrating the non-linear relation between remote sensing indexes and malaria incidences. GP method contributes to increase the precision of predicting the spatial distribution of malaria incidence. Conclusively, this paper provides a basis for future scientific research on predicting spatial distribution and mapping malaria using remote sensing data.

1 引言

疟疾是由疟原虫引起的通过按蚊传播的一种寄生虫疾病,是世界上最严重的寄生虫疾病之一,具有传播范围广、威胁人口多的特点。世界100多个国家和地区的超过23亿人口(世界人口的41%)不同程度地受到疟疾的威胁(世界卫生组织,WHO,http://www.who.int/en/)。据世界卫生组织估计,2010年2.16亿人口感染了疟疾,在106个国家出现了655万疟疾死亡病例(World Malaria Report 2011,http://www.who.int/malaria/world_malaria_report_2011/en/)。近年来,尽管各国在抗击和消除疟疾方面取得一系列成绩,但和半个世纪前相比,疟疾发病率的空间分布范围并没有显著的变化。研究表明,2010年全球范围内,分别有11.3亿和14.4亿人口生活在间日疟不稳定风险区域(发病率为 1/10万~10/10万)和稳定风险区域(发病率>10/10万)[1],同时,25亿人受到恶性疟的威胁[2]
中国部分地区是疟疾高发区域,其中安徽省所在的淮河流域是世界上典型的中纬度疟疾高发区域之一[1-2]。在中国流行最广的是间日疟,其次为恶性疟。截至2000年,疟疾的报告发病率已经降至1.94/10万,报告发病数也从70年代初的2400多万例降至2000年的24 088例。自2000-2006年间,全国有超过6.4万疟疾病例,平均疟疾发病率达到5.0/100 000。中国的疟疾疫情存在显著的地区性分布特征,其中华北地区的淮河流域是中国最主要的流行区域之一[3]。安徽省的疟疾疫情呈现显著的空间聚集性[4]
目前,多采用线性回归或改进的线性回归方法,建立疟疾与气象和环境因素之间的关系[17-18],以识别不同NDVI值对疟疾发病率的影响程度[19-25]。遗传规划方法(genetic programming, GP)是一种建立具有复杂结构的非线性关系的优化方法。它可将特定问题的非线性关系以数学表达式形式表达;在解决复杂的非线性方面,其可剔除对因变量无函数关系的自变量,结果的函数结构更接近实际情况[26-27],是预测疟疾实用有效的方法。

2 基于遗传规划的疟疾发病率预测模型

Fig. 1 Flowchart of GP-based malaria prediction model

图1 基于遗传规划的疟疾预测模型流程图

将疟疾发病率作为因变量,以选择的预测指标作为自变量,利用遗传规划方法,建立考虑空间特征的非线性关系。遗传规划方法(GP)是一种通过进化算法探索具有复杂函数结构问题解的优化方法,问题的解可用包含终端元素集合和函数元素集合的树形结构表达[28]。遗传规划方法是遗传算法(Genetic Algorithm, GA)的扩展,该方法可对复杂的非线性问题建立函数关系,在进化过程中去除与因变量无关的自变量因子,建立更接近真实情况的合理函数结构。
Fig. 2 Flowchart of GP method

图2 遗传规划流程图

fitness = i = 1 N y i - p i 作为适应度函数,其中,N是个体总数,yipi分别是观测值和GP预测值。根据以下步骤生成新的群体:(1)根据适应度随机选定双亲个体,通过对双亲个体随机选定的部分依概率交换产生新的个体;(2)根据适应度随机选定亲代个体,依概率突变产生新的个体。重复以上步骤,直到满足终止准则。最后分析GP预测结果的精度和可靠性。MATLAB的遗传规划工具箱GPLAB是目前GP建模的主要工具之一(GPLAB-A Genetic Programming Toolbox for MATLAB,http://gplab.sourceforge.net)。例如,Karakus、Shen、Johari和Olague等使用GPLAB,成功解决了一系列具有复杂函数结构的工程和科学问题[30-33]

3 安徽省疟疾发病率的GP预测分析

3.1 研究区地理特征与数据源

Fig. 3 Geographical condition of the study area and spatial distribution of annual average malaria incidences in each county

图3 研究区域地理概况及各县年平均疟疾发病率空间分布图

疟疾数据是2004-2010年间安徽省各县逐月病例数的统计数据,图4叠加了各县年平均疟疾发病率的空间分布。7 a间,安徽省的疟疾发病数达到108 266,年平均发病率达到162.58/100 000,发病率超过全国平均水平,也超过稳定风险区域10/100 000的标准[1-2]图4为2004-2010年间安徽省各县在各月的平均发病率分布图,图中以不稳定风险区域(平均发病率1/100 000)和稳定风险区域(平均发病率10/100 000)的标准制图;另外,以箱线图的形式统计各个月份的发病率分布特征,黑色折线连接了各月安徽省全省的平均发病率。该图表现了6-10月是疟疾高发时间段,从空间上看,疟疾高发区域主要集中在安徽省北部,即淮河流域地区。
Fig. 4 Monthly malaria incidences of each county in Anhui Province from 2004 to 2010

图4 2004-2010年间安徽省各县月平均疟疾发病率

本文使用Terra MODIS产品获取遥感监测指标,将这些指标预处理后得到遥感数据变量,包括逐月的NDVI和逐月平均地表温度(LST),其空间分辨率为1 km×1 km。以上遥感数据分别在空间上取各县的平均值,即这些变量预处理后反映了各县2004-2010年间各月的气象和环境条件,且时间上和空间上均与疟疾发病率数据一致。图5为遥感检测数据NDVI和LST的空间分布及逐月统计图,其中,月平均NDVI在8月达到最大值,平均LST在7月达到最大值。
Fig. 5 Spatial distributions and monthly statistics of NDVI and LST

图5 NDVI和地表温度空间分布及逐月统计图

Fig. 6 Spatial distributions of auxiliary data

图6 辅助因素的空间分布

3.2 遗传规划的疟疾预测

相关分析结果表明,在安徽省范围内,3类遥感数据与疟疾发病率显著正相关;考虑各类遥感数据对疟疾发病滞后0个月、1个月和2个月的影响,结果为NDVI滞后2个月的相关系数最大,LST滞后1个月的相关系数最大,如表1所示。在对4类辅助数据的相关性分析中,人口和GDP 2类因素的相关性显著,其中,人口与疟疾发病率之间为正相关,GDP与疟疾发病率之间为负相关(表2)。对因素预处理和选择后,得到4类疟疾问题的自变量,这些变量将作为遗传规划实验的变量(表3),分别是Ln(人口)、Ln(GDP)、NDVI(lag=2)和平均LST(lag=1)。
Tab. 1 Spearman correlation coefficients between monthly average malaria incidences and remote sensing indexes

表1 月平均疟疾发病率与遥感监测指标的Spearman相关系数表

滞后期(lag) NDVI 平均LST N
0 0.129** 0.293** 936
1 0.210** 0.333** 936
2 0.229** 0.279** 936


Tab. 2 Spearman correlation coefficients between annual average malaria incidences and auxiliary factors in each county

表2 各县年平均疟疾发病率与辅助因素数据的Spearman相关系数

辅助变量 相关系数 P N
Ln(高程) 0.041 0.723 78
Ln(人口) 0.615** 0.000 78
Ln(迁入人口) 0.202 0.076 78
Ln(GDP) -0.328** 0.003 78


Tab. 3 Statistics of variables in GP prediction

表3 遗传规划预测中的变量统计

变量 描述 最小值 平均值 中位数 最大值
X1 Ln(人口) 11.472 13.466 13.509 14.552
X2 Ln(GDP) 8.276 9.593 9.527 11.074
X3 NDVI(lag=2) 0.202 0.555 0.547 0.847
X4 平均LST(lag=1) -0.678 15.706 17.007 28.250
Fig. 7 Spatial distributions of the counties stem from randomly selected training data and test data

图7 随机选取的训练数据和检验数据对应的县级行政单元空间分布

Fig. 8 GP fitness varying process and the comparison between precision and complexity

图8 遗传规划适应度变化图及精度与复杂度对比图

Tab. 4 Parameter settings for GP

表4 遗传规划参数设置

参数 参数描述和设置
终端元素集 变量X1,X2,X3,X4
函数元素集 +,-,×,/,power,log,exp,sqrt
群体大小 200个个体
代数 100
适应度函数形式 绝对误差和(SAD)
遗传算子 交叉、突变
初始化概率 [0.85,0.15]
算子的概率 动态变化的概率
树形结构的深度 动态深度选择
动态最大深度 15
结果中的最大深度 17
随机选择方法 Lexictour
存活方式 Totalelitism (elistism)

4 疟疾预测结果分析和评价

4.1 训练和检验结果评价

Fig. 9 Comparison between GP-based predicted malaria incidences and original data

图9 基于遗传规划的疟疾发病率数据预测结果与原始数据对比图

Fig. 10 Mapping of GP-based predicted results

图10 遗传规划预测结果图

4.2 不同方法结果对比分析

ARE = 1 N i = 1 N O i - P i ( O i + 1 ) × 100 (1)
式中, O i P i 分别表示第i个数据的观测值和预测值; N 是数据集中的观测值个数。 R 2 是模型的决定系数。从表5可看出,对于训练数据和检验数据,遗传规划方法的预测精度都高于线性逐步回归方法的预测精度。
Tab. 5 Prediction errors of GP-based model and linear stepwise regression method

表5 GP和线性逐步回归预测误差对比

方法 训练数据(660组)的结果 检验数据(276组)的结果
ARE(%) R2 ARE(%) R2
GP 13.335 0.558 17.365 0.429
线性逐步回归 28.785 0.470 29.739 0.408



5 结论


The authors have declared that no competing interests exist.

Gething P W, Patil A P, Smith D L, et al.A new world malaria map: Plasmodium falciparum endemicity in 2010[J]. Malar J, 2011,10(378):1475-2875.

Gething P W, Elyazar I R F, Moyes C L, et al. A long neglected world malaria map: Plasmodium vivax endemicity in 2010[J]. PLoS neglected tropical diseases, 2012,6(9):e1814.

Zhang W, Wang L, Fang L, et al.Spatial analysis of malaria in Anhui province, China[J]. Malar J, 2008,7(4):398-408.

王丽萍. 安徽疟疾疫情时空分析及影响因素研究[D].北京:中国疾病预防控制中心,2008.

Thomson M C, Connor S J, Milligan P, et al.Mapping malaria risk in Africa: What can satellite data contribute?[J]. Parasitology Today, 1997,13(8):313-318.

Hay S I.An overview of remote sensing and geodesy for epidemiology and public health application[J]. Advances in parasitology, 2000,47:1-35.

Hay S I, Snow R W, Rogers D J.Predicting malaria seasons in Kenya using multitemporal meteorological satellite sensor data[J]. Transactions of the Royal Society of Tropical Medicine and Hygiene, 1998,92(1):12-20.

Snow R W, Gouws E, Omumbo J, et al.Models to predict the intensity of Plasmodium falciparum transmission: Applications to the burden of disease in Kenya[J]. Transactions of the Royal Society of Tropical Medicine and Hygiene, 1998,92(6):601-606.

Omumbo J A, Hay S I, Goetz S J, et al.Updating historical maps of malaria transmission intensity in East Africa using remote sensing[J]. Photogrammetric engineering and remote sensing, 2002,68(2):161-166.

Perlmann H, Helmby H, Hagstedt M, et al.IgE elevation and IgE anti-malarial antibodies in Plasmodium falciparum malaria; association of high IgE levels with cerebral malaria[J]. Clinical & Experimental Immunology, 1994,97(2):284-292.

Lindblade K A, Walker E D, Onapa A W, et al.Land use change alters malaria transmission parameters by modifying temperature in a highland area of Uganda[J]. Tropical Medicine & International Health, 2000,5(4):263-274.

Anthony R L, Bangs M J, Hamzah N, et al.Heightened transmission of stable malaria in an isolated population in the highlands of Irian Jaya, Indonesia[J]. The American journal of tropical medicine and hygiene, 1992,47(3):346-356.

Béguin A, Hales S, Rocklöv J, et al.The opposing effects of climate change and socio-economic development on the global distribution of malaria[J]. Global Environmental Change, 2011,21(4):1209-1214.

Zucker J R.Changing patterns of autochthonous malaria transmission in the United States: a review of recent outbreaks[J]. Emerg Infect Dis, 1996,2(1):37-43.

Ndao M, Bandyayera E, Kokoskin E, et al.Comparison of blood smear, antigen detection, and nested-PCR methods for screening refugees from regions where malaria is endemic after a malaria outbreak in Quebec, Canada[J]. Journal of clinical microbiology, 2004,42(6):2694-2700.

Macgreevy P B, Dietze R, Prata A, et al.Effects of immigration on the prevalence of malaria in rural areas of the Amazon basin of Brazil[J]. Memorias do Instituto Oswaldo Cruz, 1989,84(4):485-491.

Brooker S, Clements A C A, Hotez P J, et al. The co-distribution of Plasmodium falciparum and hookworm among African schoolchildren[J]. Malaria journal, 2006,5(1):99.

Nihei N, Hashida Y, Kobayashi M, et al.Analysis of malaria endemic areas on the Indochina Peninsula using remote sensing[J]. Japanese journal of infectious diseases, 2002,55(5):160-166.

Liu J, Chen X.Relationship of remote sensing normalized differential vegetation index to Anopheles density and malaria incidence rate[J]. Biomedical and Environmental Sciences, 2006,19(2):130-132.

Gomez-Elipe A, Otero A, Van Herp M, et al.Forecasting malaria incidence based on monthly case reports and environmental factors in Karuzi, Burundi, 1997-2003[J]. Malaria Journal, 2007,6(1):129.

Gaudart J, Touré O, Dessay N, et al.Modelling malaria incidence with environmental dependency in a locality of Sudanese savannah area, Mali[J]. Malar J, 2009,8:61.



Midekisa A, Senay G, Henebry G M, et al. Remote sensing-based time series models for malaria early warning in the highlands of Ethiopia[J]. Malar J, 2012,11(1):165(3):291-295.

Wimberly M C, Midekisa A, Semuniguse P, et al.Spatial synchrony of malaria outbreaks in a highland region of Ethiopia[J]. Tropical Medicine & International Health, 2012,17(10):1192-1201.

Koza J R.Genetic programming: on the programming of computers by means of natural selection[M]. Cambridge, MA: MIT press, 1992.

云庆夏,黄光球,王战权. 遗传算法和遗传规划:一种搜索寻优技术[M].北京:冶金工业出版社,1997.

Holland J H.Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence[M]. Ann Arbor, MI: University of Michigan Press, 1975.

Liao Y, Wang J, Meng B, et al.Integration of GP and GA for mapping population distribution[J]. International Journal of Geographical Information Science, 2010,24(1):47-67.

Karakus M.Function identification for the intrinsic strength and elastic properties of granitic rocks via genetic programming (GP)[J]. Computers & Geosciences, 2011,37(9):1318-1323.

Shen J, Karakus M, Xu C.Direct expressions for linearization of shear strength envelopes given by the Generalized Hoek-Brown criterion using genetic programming[J]. Computers and Geotechnics, 2012,44:139-146.

Johari A, Habibagahi G, Ghahramani A.Prediction of soil-water characteristic curve using genetic programming[J]. Journal of Geotechnical and Geoenvironmental Engineering, 2006,132(5):661-665.

Olague G, Trujillo L.Evolutionary-computer-assisted design of image operators that detect interest points using genetic programming[J]. Image and Vision Computing, 2011,29(7):484-498.

