地球信息科学学报 ›› 2022, Vol. 24 ›› Issue (1): 114-126.doi: 10.12082/dqxxkx.2022.210575

• 地球信息科学理论与方法 • 上一篇    下一篇

基于语义规则和词向量的台风灾害网络情感分析方法

林筱妍1,2(), 吴升1,2,3,*()   

  1. 1.福州大学数字中国研究院(福建),福州 350003
    2.空间数据挖掘与信息共享教育部重点实验室,福州 350003
    3.政务大数据应用协同创新中心,福州 350003
  • 收稿日期:2021-09-24 修回日期:2021-11-29 出版日期:2022-01-25 发布日期:2022-03-25
  • 通讯作者: * 吴升(1972— ),男,福建松溪人,博士,教授,主要从事时空数据分析与可视化、数字化规划、智慧应急等。E-mail: ws0110@163.com
    * 吴升(1972— ),男,福建松溪人,博士,教授,主要从事时空数据分析与可视化、数字化规划、智慧应急等。E-mail: ws0110@163.com
  • 作者简介:林筱妍(1997— ),女,福建闽侯人,硕士生,主要从事灾害信息挖掘等研究。E-mail: 542726737@qq.com
  • 基金资助:
    福建省科技创新平台项目(〔2015〕75);福建省科技创新平台项目(〔2017〕675)

Typhoon Disaster Network Emotion Analysis Method based on Semantic Rules and Word Vector

LIN Xiaoyan1,2(), WU Sheng1,2,3,*()   

  1. 1. Academy of Digital China (Fujian), Fuzhou University, Fuzhou 350003, China
    2. Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education, Fuzhou 350003, China
    3. Fujian Collaborative Innovation Center for Big Data Applications in Governments, Fuzhou 350003, China
  • Received:2021-09-24 Revised:2021-11-29 Online:2022-01-25 Published:2022-03-25
  • Supported by:
    Fujian Science and Technology Innovation Platform Project(〔2015〕75);Fujian Science and Technology Innovation Platform Project(〔2017〕675)

摘要:

灾害期间的舆情引导有助于维护社会稳定。社交媒体是舆论传播的重要渠道,通过微博评论了解用户的网络情感及关注的话题,可以帮助相关舆情监测部门掌握公众的关注热点,从而选择适当的干预节点来应对网络舆情,并对公众情绪进行疏导,这对于应急管理具有现实意义。现有的研究大多是利用有监督的机器学习方法进行情感分类,这需要人工进行语料的标注,工作量大。本文根据微博评论文本的特点,综合考虑情感词以及表情符号等多重情感源,构建了台风灾害领域情感词典。在此基础上,提出了一种基于情感词语义规则的情感倾向计算方法,以及基于词向量的话题聚类方法。首先,采集了近年5次台风灾害期间共计40多万条微博评论文本,基于大连理工情感词汇本体库进行扩展构建了台风灾害领域情感词典,结合PMI法构建表情符号词典,根据语义规则确定情感倾向,并使用3500条评论文本验证了该方法的有效性。然后,本文基于词向量、TF-IDF与K-means的聚类方法探索灾害期间热点话题。最后,以2020年4号台风“黑格比”为例,基于台风期间的5万余条微博评论文本进行了舆情情感分析,并识别出6类与台风相关的话题。通过时空分析发现,随着时间的推移,微博评论文本的数量发生一定变化,评论数量多的地区大都集中在沿海地区和经济水平高的地区,台风登陆当天浙江省的恐惧情感达到最高。结果表明,基于语义规则和词向量的台风灾害网络情感分析方法,能在类似灾害事件发生时为政府部门掌握和引导网络舆情提供辅助。

关键词: 台风, 情感分析, 话题识别, 网络舆情, 文本聚类, 语义规则, 情感词典, 词向量

Abstract:

During natural disasters, public opinion guidance contributes to maintaining social stability. Social media is an important channel for the dissemination of public opinion. Understanding users' network emotions and topics of concern through microblog comments can help relevant public opinion monitoring departments master the hot spots of public concern, so as to select appropriate intervention nodes to deal with network public opinion and dredge public emotions, which is of practical significance for emergency management. Most of the existing researches use supervised machine learning methods for emotion classification, which requires manual labeling of corpus, and the workload is large. While the unsupervised methods are mainly based on the existing emotional dictionary, which can reflect the unstructured characteristics of the text and is easy to understand and explain. According to the characteristics of microblog comments, this paper constructs an emotional dictionary in the field of typhoon disaster by comprehensively considering multiple emotional sources such as emotional words and emoticons. Based on this, this paper proposes a method to calculate emotional tendency based on semantic rules of emotional words and a topic clustering method based on word vector. Firstly, this study collected a total of more than 400 000 comments on Sina Weibo during five times typhoon disasters in recent years and constructed the emotional dictionary in the field of typhoon disaster based DUTIR. We built the expression symbol dictionary combined with the Pointwise Mutual Information method. We determined the emotional tendencies according to the semantic rules, and we used 3500 comments to demonstrate the effectiveness of the proposed method. Secondly, based on the clustering method of word vector, TF-IDF, and K-means, we explored the hot topics during these disasters. Finally, taking typhoon Hagupit, the fourth typhoon in 2020, as an example, this paper conducted an analysis on more than 50 000 Weibo comments during the typhoon disaster, and identified 6 categories of typhoon-related topics. Through the spatial-temporal analysis, it was found that the number of comments on Weibo changed as time went on, and the areas with a large number of comments were also concentrated in coastal areas and areas with high economic level. On the day of typhoon Hagupit landing, the fear in Zhejiang province reached the highest level. The results show that the typhoon disaster network emotion analysis method based on semantic rules and word vector can provide assistance for government departments to master and guide network public opinion when similar disaster events occur.

Key words: typhoon, sentiment analysis, topic detection, network public opinion, text clustering, semantic rules, emotion dictionary, word vector