地球信息科学学报 ›› 2018, Vol. 20 ›› Issue (7): 906-917.doi: 10.12082/dqxxkx.2018.180062
杨腾飞1,2(), 解吉波1,*(
), 李振宇3, 李国庆1
收稿日期:
2018-01-18
修回日期:
2018-03-20
出版日期:
2018-07-20
发布日期:
2018-07-13
通讯作者:
解吉波
E-mail:yangtf@radi.ac.cn;xiejb@radi.ac.cn
作者简介:
作者简介:杨腾飞(1988-),男,博士生,研究方向为自然语言处理、灾害信息挖掘。E-mail:
基金资助:
YANG Tengfei1,2(), XIE Jibo1,*(
), LI Zhenyu3, LI Guoqing1
Received:
2018-01-18
Revised:
2018-03-20
Online:
2018-07-20
Published:
2018-07-13
Contact:
XIE Jibo
E-mail:yangtf@radi.ac.cn;xiejb@radi.ac.cn
Supported by:
摘要:
社交媒体在灾害信息的实时发布与传播中发挥着越来越重要的作用。在灾害发生过程中,社交媒体中蕴含的实时灾损信息对灾情及时响应和评估有重要意义。然而,这些涉灾文本具有信息破碎度高、文本特征稀疏、标注语料库匮乏等缺点,使得传统的基于监督学习的方法难以有效提取其中的灾损信息。为此,本文提出了一种通过扩展上下文特征和匹配特征词的方法来快速识别和分类社交媒体中蕴含的不同类别的灾损信息。本方法首先基于中文语法规则,抽取小规模不同灾损类别下微博文本中的涉灾关键词构建特征词搭配对。然后,利用词向量模型和已有词库对这些特征词搭配对进行补充和扩展。同时,根据中文词语共现规则,引入外部语料库优化特征词间的语义搭配关系。最终,以此为基础构建台风灾损分类知识库对灾情文本中蕴含的不同类别灾损信息进行识别和分类。本文以2016年9月15日台风“莫兰蒂”登陆事件作为研究案例,以评估本文方法在灾损信息识别和分类上的效果。结果表明,本文方法对微博文本中蕴含的不同类别风灾损失信息的识别和分类效果显著(各类别综合评价指标都达到了0.74以上)。基于灾损信息分类结果,本文绘制了台风影响的时空分布图,从而进一步说明本文方法在灾害损失评估和减灾救灾方面的效用。
杨腾飞, 解吉波, 李振宇, 李国庆. 微博中蕴含台风灾害损失信息识别和分类方法[J]. 地球信息科学学报, 2018, 20(7): 906-917.DOI:10.12082/dqxxkx.2018.180062
YANG Tengfei,XIE Jibo,LI Zhenyu,LI Guoqing. A Method of Typhoon Disaster Loss Identification and Classification Using Micro-blog Information[J]. Journal of Geo-information Science, 2018, 20(7): 906-917.DOI:10.12082/dqxxkx.2018.180062
表5
实验结果对比"
类别 | 评测结果 | ||
---|---|---|---|
P/% | R/% | F-1值/% | |
第1类人员伤亡 | 68.00 | 89.47 | 77.27 |
第2类供水影响 | 87.32 | 95.48 | 91.22 |
第3类建筑物损伤 | 76.10 | 85.14 | 80.37 |
第4类商业影响 | 100.00 | 75.00 | 85.71 |
第5类林业影响 | 79.00 | 84.61 | 81.71 |
第6类交通受阻 | 78.74 | 87.71 | 82.98 |
第7类交通工具损坏 | 74.19 | 88.46 | 80.70 |
第8类供电影响 | 90.29 | 93.93 | 92.07 |
第9类电力设施损坏 | 78.54 | 70.53 | 74.32 |
第10类通讯影响 | 86.95 | 71.42 | 78.43 |
第11类基础设施受损 | 76.47 | 72.22 | 74.28 |
[1] |
Sakaki T, Okazaki M, Matsuo Y.Tweet analysis for real-time event detection and earthquake reporting system development[J]. IEEE Transactions on Knowledge & Data Engineering, 2013,25(4):919-931.
doi: 10.1109/TKDE.2012.29 |
[2] | Bird D, Ling M, Haynes K.Flooding facebook: The use of social media during the queensland and Victorian floods[J]. Australian Journal of Emergency Management, 2012,27(1):27-33. |
[3] |
王艳东,李昊,王腾,等.基于社交媒体的突发事件应急信息挖掘与分析[J].武汉大学学报·信息科学版,2016,41(3):290-297.
doi: 10.13203/j.whugis20140804 |
[ Wang Y D, Li H, Wang T, et al.Emergency information mining and analysis of emergency based on social media[J]. Geomatics and Information Science of Wuhan University, 2016,41(3):290-297. ]
doi: 10.13203/j.whugis20140804 |
|
[4] |
彭敏,官宸宇,朱佳晖,等.面向社交媒体文本的话题检测与追踪技术研究综述[J] 武汉大学学报·理学版,2016,62(3):197-217.
doi: 10.14188/j.1671-8836.2016.03.001 |
[ Peng M, Guan C Y, Zhu J H, et al.A survey of topic detection and tracking technology for social media texts[J]. Journal of Wuhan University(Science Edition), 2016,62(3):197-217. ]
doi: 10.14188/j.1671-8836.2016.03.001 |
|
[5] |
牟乃夏,张恒才,陈洁,等.轨迹数据挖掘城市应用研究综述[J].地球信息科学学报,2015,17(10):1136-1142.
doi: 10.3724/SP.J.1047.2015.01136 |
[ Mu N X, Zhang H C, Chen J, et al.A survey of urban application research on trajectory data mining[J]. Journal of Geo-information Science, 2015,17(10):1136-1142. ]
doi: 10.3724/SP.J.1047.2015.01136 |
|
[6] | American Red Cross.Social media in disasters and emergencies.. |
[7] | Sakaki T, Okazaki M, Matsuo Y, et al.Earthquake shakes Twitter users: rReal-time event detection by social sensors[C]. International Conference on World Wide Web. ACM, April 26-30, 2010, Raleigh, NorthCarolina, USA, 2010:851-860. |
[8] | Qu Y, Huang C, Zhang P, et al.Microblogging after a major disaster in China: A case study of the 2010 Yushu earthquake[C]. ACM Conference on Computer Supported Cooperative Work, CSCW 2011, Hangzhou, China, March. DBLP, 2011:25-34. |
[9] | Chae J, Thom D, Jang Y, et al.Special section on visual analytics: Public behavior response analysis in disaster events utilizing visual analytics of microblog data[J]. Computers & Graphics, 2014,38(1):51-60. |
[10] | 陈梓,高涛,罗年学,等.反映自然灾害时空分布的社交媒体有效性探讨[J].测绘科学,2017,42(8):44-48. |
[ Chen Z, Gao T, Luo N X, et al.Social media effectiveness to reflect the spatial and temporal distribution of natural disasters[J]. Science of Surveying and Mapping, 2017,42(8):44-48. ] | |
[11] |
刘宏波,翟国方.基于社交媒体信息不同灾害的社会响应特征比较研究[J].灾害学,2017,32(1):187-193.
doi: 10.3969/j.issn.1000-811X.2017.01.033 |
[ Liu H B, Zhai G F.A comparative study of the social response characteristics of different disasters based on social media information[J]. Journal of Catastrophology, 2017,32(1):187-193. ]
doi: 10.3969/j.issn.1000-811X.2017.01.033 |
|
[12] |
Mark A. Stoové, Alisa E.Pedrana. Making the most of a brave new world: Opportunities and considerations for using Twitter as a public health monitoring tool[J]. Preventive Medicine, 2014,63(6):109-111.
doi: 10.1016/j.ypmed.2014.03.008 pmid: 24632229 |
[13] |
Paola Velardi, Giovanni Stilo, Alberto E.Tozzi, et al. Twitter mining for fine-grained syndromic surveillance[J]. Artificial Intelligence in Medicine, 2014,61(3):153-163.
doi: 10.1016/j.artmed.2014.01.002 pmid: 24613716 |
[14] |
仇培元,陆锋,张恒才,等.蕴含地理事件微博客消息的自动识别方法[J].地球信息科学学报,2016,18(7):886-893.
doi: 10.3724/SP.J.1047.2016.00886 |
[ Qiu P Y, Lu F, Zhang H C, at al. Containing automatic recognition methods for geo-event micro-blog messages[J]. Journal of Geo-information Science, 2016,18(7):886-893. ]
doi: 10.3724/SP.J.1047.2016.00886 |
|
[15] |
张春菊. 面向中文文本的事件时空与属性信息解析方法研究[J].测绘学报,2015,44(5):590-590.
doi: 10.11947/j.AGCS.2015.20140657 |
[ Zhang C J.Research on the analysis method of event space-time and attribute information for Chinese texts[J]. Acta Geodaetica et Cartographica Sinica, 2015,44(5):590-590. ]
doi: 10.11947/j.AGCS.2015.20140657 |
|
[16] |
张雪英. 基于机器学习的文本自动分类研究进展[J].情报学报,2006,25(6):730-739.
doi: 10.3969/j.issn.1000-0135.2006.06.012 |
[ Zhang X Y.Research progress of automatic text classification based on machine learning[J]. Journal of the China Society For Scientific and Technical Information, 2006,25(6):730-739. ]
doi: 10.3969/j.issn.1000-0135.2006.06.012 |
|
[17] |
Kumar M A, Gopal M.A comparison study on multiple binary-class SVM methods for unilabel text categorization[J]. Pattern Recognition Letters, 2010,31(11):1437-1444.
doi: 10.1016/j.patrec.2010.02.015 |
[18] |
Burbidge R, Trotter M, Buxton B, et al.Drug design by machine learning: Support vector machines for pharmaceutical data analysis[J]. Computers & Chemistry, 2001,26(1):5-14.
doi: 10.1016/S0097-8485(01)00094-8 pmid: 11765851 |
[19] | ]Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan,et al. When Is “Nearest Neighbor” Meaningful?[C]. International Conference on Database Theory. Springer, Berlin, Heidelberg, 1999:217-235. |
[20] |
Jiang S, Pang G, Wu M, et al.An improved K-nearest-neighbor algorithm for text categorization[J]. Expert Systems with Applications An International Journal, 2012,39(1):1503-1509.
doi: 10.1016/j.eswa.2011.08.040 |
[21] | Sankaranarayanan J, Samet H, Teitler B E, et al.TwitterStand: News in tweets[C]. ACM Sigspatial International Conference on Advances in Geographic Information Systems. ACM, 2009:42-51. |
[22] |
Xu B, Guo X, Ye Y, et al.An improved random forest classifier for text categorization[J]. Journal of Computers, 2012,7(12):2913-2920.
doi: 10.4304/jcp.7.12.2913-2920 |
[23] |
Li R, Tao X, Lei T, et al.Using maximum entropy model for Chinese text categorization[J]. Journal of Computer Research & Development, 2005,42(1):578-587.
doi: 10.1360/crad20050113 |
[24] |
丁效,宋凡,秦兵,等.音乐领域典型事件抽取方法研究[J].中文信息学报,2011,25(2):15-20.
doi: 10.3969/j.issn.1003-0077.2011.02.003 |
[ Ding X, Song F, Qing B, at al. Research on typical event extraction method in music field[J]. Journal of Chinese Information Processing, 2011,25(2):15-20. ]
doi: 10.3969/j.issn.1003-0077.2011.02.003 |
|
[25] | 张剑峰,夏云庆,姚建民.微博文本处理研究综述[J].中文信息学报,2012,26(4):21-27. |
[ Zhang J F, Xia Y Q, Yao J M.Weibo text processing research review[J]. Journal of Chinese Information Processing, 2012,26(4):21-27. ] | |
[26] | Yang T, Xie J, Li G.A social media based dataset of typhoon disasters[DB]. Science Data Bank, 2017, DOI: 10.11922/sciencedb.547. |
[27] | Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[J]. ArXiv Preprint arX-iv: 13013781,2013. |
[28] |
Bengio Y, Ducharme R, Vincent P, et al.A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003,3:1137-1155.
doi: 10.1007/3-540-33486-6_6 |
[29] |
熊富林,邓怡豪,唐晓晟. Word2vec的核心架构及其应用[J].南京师范大学学报(工程技术版),2015(1):43-48.
doi: 10.3969/j.issn.1672-1292.2015.01.008 |
[ Xiong F L, Deng Y H, Tang X S. Word2vec's core architecture and its application[J]. Journal of Nanjing Normal University, 2015(1):43-48. ]
doi: 10.3969/j.issn.1672-1292.2015.01.008 |
|
[30] | 刘丹丹,彭成钱龙华,等.《同义词词林》在中文实体关系抽取中的作用[J].中文信息学报,2014,28(2):91-99. |
[ Liu D D, Peng C, Qian L H, at al. The role of synonym in the extraction of Chinese entity Relationships[J]. Journal of Chinese Information Processing, 2014,28(2):91-99. ] | |
[31] | 王东,熊世桓.基于同义词词林扩展的短文本分类[J].兰州理工大学学报,2015,41(4):104-108. |
[ Wang D, Xiong S H.Short text classification based on synonym word forest expansion[J].Journal of Lanzhou University of Technology, 2015,41(4):104-108. ] | |
[32] | 胡勇军,江嘉欣,常会友.基于LDA高频词扩展的中文短文本分类[J].现代图书情报技术,2013(6):42-48. |
[ Hu Y J, Jiang J X, Chang H Y.Chinese short text classification based on LDA high-frequency word expansion[J]. New Technology of Library and Information Service, 2013(6):42-48. ] | |
[33] | 庞观松,蒋盛益.文本自动分类技术研究综述[J].情报理论与实践,2012,35(2):123-128. |
[ Pang G S, Jiang S Y.A survey of automatic text classification technology[J]. Information studies: Theory & Application, 2012,35(2):123-128. ] | |
[34] | 白华,林勋国.基于中文短文本分类的社交媒体灾害事件检测系统研究[J].灾害学,2016,31(2):19-23. |
[ Bai H, Lin X G.Social media disaster event detection system based on Chinese short text classification[J]. Journal of Catastrophology, 2016,31(2):19-23. ] |
[1] | 贾梦姝, 张宇, 潘婷婷, 吴文周, 苏奋振. 面向互联网信息抽取的海洋环境灾害链本体构建——以台风灾害为例[J]. 地球信息科学学报, 2020, 22(12): 2289-2303. |
[2] | 曾婷婷, 宫阿都, 陈艳玲, 杨雨晴. 基于历史相似案例空间推演的地震伤亡人口评估方法研究[J]. 地球信息科学学报, 2020, 22(11): 2166-2176. |
[3] | 刘淑涵, 王艳东, 付小康. 利用卷积神经网络提取微博中的暴雨灾害信息[J]. 地球信息科学学报, 2019, 21(7): 1009-1017. |
[4] | 梁春阳, 林广发, 张明锋, 汪玮杨, 张文富, 林金煌, 邓超. 社交媒体数据对反映台风灾害时空分布的有效性研究[J]. 地球信息科学学报, 2018, 20(6): 807-816. |
[5] | 陈瑗瑗, 高勇. 利用社交媒体的位置潜语义特征提取与分析[J]. 地球信息科学学报, 2017, 19(11): 1405-1414. |
[6] | 江岭, 汤国安, 王春, 宋效东, 崔灵周. 基于F-DEM的洪水淹没区精确快速提取[J]. 地球信息科学学报, 2013, 15(1): 68-74. |
[7] | 万洪涛, 程晓陶, 胡昌伟. 基于WebGIS的流域级洪水管理系统集成与应用[J]. 地球信息科学学报, 2009, 11(3): 363-369. |
|