地球信息科学学报 ›› 2018, Vol. 20 ›› Issue (7): 906-917.doi: 10.12082/dqxxkx.2018.180062

• 地球信息科学理论与方法 • 上一篇    下一篇

微博中蕴含台风灾害损失信息识别和分类方法

杨腾飞1,2(), 解吉波1,*(), 李振宇3, 李国庆1   

  1. 1. 中国科学院遥感与数字地球研究所,北京 100049
    2. 中国科学院大学,北京 100049
    3. 山东科技大学,青岛 266000
  • 收稿日期:2018-01-18 修回日期:2018-03-20 出版日期:2018-07-20 发布日期:2018-07-13
  • 通讯作者: 解吉波 E-mail:yangtf@radi.ac.cn;xiejb@radi.ac.cn
  • 作者简介:

    作者简介:杨腾飞(1988-),男,博士生,研究方向为自然语言处理、灾害信息挖掘。E-mail: yangtf@radi.ac.cn

  • 基金资助:
    国家重点研发项目(2016YFE0122600);国家自然科学基金项目(41771476)

A Method of Typhoon Disaster Loss Identification and Classification Using Micro-blog Information

YANG Tengfei1,2(), XIE Jibo1,*(), LI Zhenyu3, LI Guoqing1   

  1. 1. Institute of Remote Sensing and Digital Earth Chinese Academy of Sciences, Beijing 100049, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Shandong University of Science and Technology, Qingdao 266000, China
  • Received:2018-01-18 Revised:2018-03-20 Online:2018-07-20 Published:2018-07-13
  • Contact: XIE Jibo E-mail:yangtf@radi.ac.cn;xiejb@radi.ac.cn
  • Supported by:
    National Key Research and Development Program of China, No.2016YFE0122600; National Natural Science Foundation of China, No.41771476

摘要:

社交媒体在灾害信息的实时发布与传播中发挥着越来越重要的作用。在灾害发生过程中,社交媒体中蕴含的实时灾损信息对灾情及时响应和评估有重要意义。然而,这些涉灾文本具有信息破碎度高、文本特征稀疏、标注语料库匮乏等缺点,使得传统的基于监督学习的方法难以有效提取其中的灾损信息。为此,本文提出了一种通过扩展上下文特征和匹配特征词的方法来快速识别和分类社交媒体中蕴含的不同类别的灾损信息。本方法首先基于中文语法规则,抽取小规模不同灾损类别下微博文本中的涉灾关键词构建特征词搭配对。然后,利用词向量模型和已有词库对这些特征词搭配对进行补充和扩展。同时,根据中文词语共现规则,引入外部语料库优化特征词间的语义搭配关系。最终,以此为基础构建台风灾损分类知识库对灾情文本中蕴含的不同类别灾损信息进行识别和分类。本文以2016年9月15日台风“莫兰蒂”登陆事件作为研究案例,以评估本文方法在灾损信息识别和分类上的效果。结果表明,本文方法对微博文本中蕴含的不同类别风灾损失信息的识别和分类效果显著(各类别综合评价指标都达到了0.74以上)。基于灾损信息分类结果,本文绘制了台风影响的时空分布图,从而进一步说明本文方法在灾害损失评估和减灾救灾方面的效用。

关键词: 社交媒体, 台风灾害, 短文本分类, 灾损信息识别, 灾情评估

Abstract:

Social media plays a more and more important role in the real-time disaster information distribution and dissemination. During the disaster event, social media usually generates and contains a lot of real-time disaster loss information, which is very useful for the timely disaster response and disaster loss assessment. However, the social media data has many shortcomings, such as high fragmentation of the information, sparsity of the text features, and the lack of annotated corpus and so on, which makes the traditional supervised learning method difficult to be effectively used for disaster information extraction. This paper proposed a fast disaster loss identification and classification method to extract the disaster information from social media data by extending the context features and matching feature words. By this method, we firstly extracted the keywords from a small amount of sample micro-blog text of different disaster loss categories based on Chinese grammar rules and constructed the pairs of feature words collocation. Then, we used the word vector model and the existing lexicon to supplement and expand these pairs of feature words collocation. And the external corpus was introduced to optimize the semantic collocation relationship between feature words according to the rules of the concurrence of Chinese words. At last, we built a classification knowledgebase for identification and classification of disaster loss information related to typhoon disasters included in micro-blog. An experiment system was developed to evaluate the method introduced in the paper. Typhoon "Meranti" landed on 15th September, 2016 was selected as a case study. Results show that this method has a significant effect (each comprehensive evaluation index of different categories is greater than 0.74) on identifying and classifying different categories of disaster loss information from social media. We mapped the spatio-temporal distribution of typhoon influence based on the classification results of disaster loss from social media. The experiment shows that the classification output data and maps could be used for the disaster loss evaluation and mitigation.

Key words: social media, typhoon disaster, short text classification, identification of disaster loss information, assessment of disasters