利用空间聚集的贝叶斯网络评估手足口病发病风险
作者简介:丘文洋(1991-),男,硕士,研究方向为空间分析与空间统计。E-mail: qiuwy@lreis.ac.cn
收稿日期: 2017-02-01
要求修回日期: 2017-05-06
网络出版日期: 2017-08-20
基金资助
国家自然科学基金项目(41471376、41171344)
上海市大气颗粒物污染防治重点实验室开放课题资助
A Bayesian Network Method Considering Spatial Cluster to Evaluate Health Risk of Hand, Foot and Mouth Disease
Received date: 2017-02-01
Request revised date: 2017-05-06
Online published: 2017-08-20
Copyright
手足口病是一种常见的传染病,以往的研究表明该疾病与气象、环境和社会经济等因素相关联,其影响关系复杂,而疾病本身体现出较强的区域聚集性,采用普通的线性风险建模方法无法捕捉影响因素的复杂性及空间聚集性。因此,本文以山东省为例,在前人研究的基础上,提出了采用贝叶斯网络综合风险建模方法研究手足口病的发病风险与气象、土地利用、社会经济及空气污染等要素间的关系,并通过引入空间扫描统计聚集结果,将空间聚集引入到贝叶斯网络模型加强其空间推理功能,减少模型的偏差,提高评估的精度。结果表明,本文建立的手足口病空间贝叶斯网络风险模型具有较高的估计效果,引入的空间聚集性较好地融入到贝叶斯概率推理模型中,合理建立预测因子同手足口病发病风险之间的关系。通过对建模结果的解译,分析了手足口病的发病风险影响因素,特别是气候、社会经济及空气污染的影响。本文的空间贝叶斯建模方法及研究结果对手足口病暴发的防控预警具有重要的意义。
丘文洋 , 李连发 , 张杰昊 , 王劲峰 . 利用空间聚集的贝叶斯网络评估手足口病发病风险[J]. 地球信息科学学报, 2017 , 19(8) : 1036 -1048 . DOI: 10.3724/SP.J.1047.2017.01036
Hand, foot and mouth disease (HFMD) is a common infectious disease. Previous studies showed that multiple factors, such as meteorological, geographical, environmental and socio-economic factors were associated with HFMD. The associations between these risk factors and disease are complex. HFMD incidences present strong spatial clustering and auto-correlation. It is difficult to capture such complex non-linear associations and spatial auto-correlation using ordinary linear regression. Based on the previous studies, we proposed a Bayesian network based integrated risk approach to explore the relationship between HFMD incidence risk and the influential factors, such as meteorological parameters, land-use pattern, socio-economic status and air pollution. HFMD is a typical disease of children in Shandong Province of China and it was taken as our study case. Our approach incorporated the output of spatial clusters obtained by scanning statistics to enhance spatial reasoning of the proposed Bayesian network. This could also reduce the bias and improved the performance of the prediction. The results showed that the integrated Bayesian network model proposed achieved higher accuracy than the other methods. Also, spatial hot spots incorporated well in our model. By interpreting the marginal probability of every influential factor in the model, we analyzed the effect of these risk factors, in particular meteorological parameters, socio-economic factors and air pollution on the HFMD incidence. Our spatial Bayesian network approach is useful and the results provided important information for early-warning, prevention and control of HFMD.
Fig. 1 Study area图1 研究区域 |
Fig. 2 Bayesian network topology of HFMD with spatial correlation图2 结合空间聚集性的手足口病贝叶斯风险建模结构图 |
Tab. 1 Methods of Bayesian network topology and parameter learning表1 贝叶斯网络结构建立及参数的学习方法[21] |
主要算法 | 主要方法 | 在手足口病风险评估中的适用性 | |
---|---|---|---|
结构学习 | K2 | 通过变量固定拓扑排序得到节点间连接[28] 初始变量顺序是基于朴素贝叶斯模型 | 局部优化算法,计算速度快,适用于处理海量数据查找各影响因素同手足口病发病风险的关系 |
爬山算法 | 通过迭代最终选择得分最高的结构模型[29] | ||
Tabu | 一种最优爬山法,在学好的结构中加Markov Blanket连接[30] | ||
TAN | 设计算法来计算极大权重扩展树[31] | ||
模拟退火 | 在上一模型基础上随机生成备选网络模型,如果该模型比上一个模型更好,则使用这个备选模型[30] | 局部优化算法,但算法较为复杂,搜索较慢,不适用于处理大数据量,但算法实施可较好地搜寻各种影响因素同手足口病发病风险的关系 | |
遗传算法 | 通过遗传算法找到最优的网络结构[32] | ||
结构微调 | 结合域知识 | 根据手足口病的传播源及影响因素领域知识[2,8,16],移除无实质意义的连接,增加新的有意义的 连接 | 结合特别适合于手足口病影响因素复杂情况,根域知识,可移除关系学习中的偏差,纠正网络 |
参数学习 | 简单贝叶斯 | 根据Dirichlet分布根据数据进行概率计算[30] | 基本求参数的方法 |
期望最大化 | EM算法,基于最大似染法,可处理数据缺失的参数的估计值[33] | 适用于有有自变量缺失的情况 | |
Gibbs抽样 | 通过蒙特卡洛方法进行抽样计算条件概率,适合数据量大的情况[34] | 适用于海量数据学习手足口病风险评估模型 |
Fig. 3 Weekly incidence rates of HFMD图3 山东省手足口病发病率时间变化趋势 |
Fig. 4 Spatial clusters of HFMD in Shandong Province图4 山东省手足口病发病空间聚集等级图 |
Tab. 2 Variable selection of HFMD Bayesian network risk model表2 手足口病贝叶斯网络风险模型变量的选择 |
类别 | 解释变量 | 属性重要性(Gain Ratio) |
---|---|---|
气象 | 日均气温 日最高气温 日最低气温 风速 相对湿度 气压 | 0.022 0.213 0.016 0.101 0.114 0.027 |
社会经济 | GDP(生产总值) 人均医院床位数 小学在校生比例 | 0.227 0.152 0.190 |
空气污染数据 | PM2.5浓度 | 0.125 |
NDVI | 区县NDVI均值 | 0.017 |
土地覆盖 | 人工覆盖比例 | 0.086 |
交通路网 | 主要及次要道路密度 | 0.168 |
空间聚集 | 空间聚集等级 | 0.219 |
Fig. 5 Bayesian network topology of HFMD risk and predictors图5 手足口病发病风险与各解释变量的贝叶斯网络结构图 |
Tab. 3 Performance comparison of different Bayesian network without spatial clusters表3 无空间聚集性贝叶斯网络风险不同模型的建模结果 |
学习算法 | 真正率(风险:高/低) | 假正率(风险:高/低) | 精确度(风险:高/低) | 准确度 | ROC面积 |
---|---|---|---|---|---|
综合BN+域知识 | 0.57/0.85 | 0.15/0.43 | 0.63/0.82 | 0.77 | 0.78 |
BN K2 | 0.52/0.82 | 0.17/0.48 | 0.56/0.80 | 0.74 | 0.79 |
BN 爬山 | 0.52/0.88 | 0.12/0.48 | 0.67/0.80 | 0.76 | 0.79 |
BN Tabu | 0.52/0.87 | 0.13/0.48 | 0.65/0.80 | 0.76 | 0.78 |
BN 模拟退火 | 0.45/0.90 | 0.10/0.55 | 0.68/0.79 | 0.77 | 0.68 |
决策树:J48 | 0.38/0.98 | 0.02/0.62 | 0.84/0.78 | 0.80 | 0.62 |
随机森林 | 0.48/0.85 | 0.15/0.52 | 0.59/0.79 | 0.74 | 0.78 |
逻辑斯特回归 | 0.48/0.91 | 0.10/0.52 | 0.59/0.80 | 0.77 | 0.70 |
Fig. 6 Bayesian network topology of HFMD risk and predictors图6 结合空间聚集性后的贝叶斯网络结构图 |
Tab. 4 Performance comparison of different Bayesian network with spatial clusters表4 有空间聚集性贝叶斯网络风险不同模型建模结果 |
学习算法 | 真正率(风险:高/低) | 假正率(风险:高/低) | 精确度(风险:高/低) | 准确度 | ROC面积 |
---|---|---|---|---|---|
BN+域知识 | 0.64/0.85 | 0.15/0.36 | 0.70/0.83 | 0.80 | 0.79 |
BN K2 | 0.62/0.85 | 0.15/0.38 | 0.65/0.84 | 0.78 | 0.78 |
BN 爬山 | 0.54/0.87 | 0.12/0.45 | 0.60/0.80 | 0.78 | 0.79 |
BN Tabu | 0.52/0.87 | 0.14/0.48 | 0.71/0.81 | 0.76 | 0.79 |
BN 模拟退火 | 0.43/0.93 | 0.06/0.58 | 0.75/0.79 | 0.78 | 0.71 |
决策树:J48 | 0.40/0.95 | 0.05/0.60 | 0.77/0.78 | 0.77 | 0.62 |
随机森林 | 0.52/0.84 | 0.15/0.50 | 0.78/0.79 | 0.74 | 0.78 |
逻辑斯特回归 | 0.50/0.86 | 0.15/0.52 | 0.60/0.80 | 0.74 | 0.69 |
Tab. 5 Conditional probability table of risk factors and HFMD incidence risk表5 解释变量与发病风险的边际条件概率表 |
解释变量 | 等级 | 范围 | 发病风险 | |
---|---|---|---|---|
高 | 低 | |||
相对湿度/% | 高 | 71.04~73.52 | 0.736 | 0.264 |
低 | 60.58~71.04 | 0.304 | 0.696 | |
日最低气温/℃ | 高 | 17.05~20.03 | 0.747 | 0.253 |
低 | 13.44~17.05 | 0.291 | 0.709 | |
GDP(元/人) | 高 | 42 146~176 826 | 0.553 | 0.447 |
中 | 20 847~42 146 | 0.386 | 0.614 | |
低 | 2200~20 847 | 0.252 | 0.748 | |
人均医院床位 | 高 | 62.15~72.23 | 0.752 | 0.248 |
中 | 28.30~62.15 | 0.325 | 0.675 | |
低 | 10.81~28.30 | 0.172 | 0.828 | |
小学在校人数比例/% | 高 | 0.831~1.082 | 0.103 | 0.897 |
低 | 0.416~0.831 | 0.317 | 0.683 | |
道路网密度/(km/km2) | 高 | 0.56~2.60 | 0.481 | 0.519 |
低 | 0.21~0.56 | 0.223 | 0.777 | |
NDVI | 高 | 0.43~0.61 | 0.160 | 0.840 |
低 | 0.18~0.43 | 0.358 | 0.642 | |
土地覆盖(人工用地比例/%) | 高 | 26.18~99.05 | 0.944 | 0.056 |
中 | 10.29~26.18 | 0.677 | 0.323 | |
低 | 1.16~10.29 | 0.210 | 0.780 |
The authors have declared that no competing interests exist.
[1] |
[
|
[2] |
[ Bie Q Q, Qiu D S, Hu H, et al. Spatial and temporal distribution characteristics of Hand-Foot-Mouth disease in China: spatial and temporal distribution characteristics of Hand-Foot-Mouth Disease in China [J]. Journal of Geo-Information Science, 2010,12(3):380-384. ]
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
[
|
[12] |
|
[13] |
|
[14] |
|
[15] |
[
|
[16] |
|
[17] |
|
[18] |
[
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
[
|
[24] |
|
[25] |
[
|
[26] |
[ A Study on the spread model and spatial distribution of HFMD based on GIS[D].A Study on the spread model and spatial distribution of HFMD based on GIS[D]. Ningbo: Ningbo University, 2010. ]
|
[27] |
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
|
[34] |
|
[35] |
[
|
[36] |
|
[37] |
|
[38] |
|
/
〈 | 〉 |