基于用户情感变化的新冠疫情舆情演变分析
张 琛(1997— ),男,河南信阳人,硕士生,从事地名地址匹配、知识图谱构建研究。E-mail: czhang0315@whu.edu.cn |
收稿日期: 2020-05-20
修回日期: 2020-07-21
网络出版日期: 2021-04-25
基金资助
中国博士后科学基金项目(2019M663070)
版权
Analysis of Public Opinion Evolution in COVID-19 Pandemic from a Perspective of Sentiment Variation
Received date: 2020-05-20
Revised date: 2020-07-21
Online published: 2021-04-25
Supported by
China Postdoctoral Science Foundation(2019M663070)
Copyright
新冠肺炎疫情作为国际性突发公共卫生事件引发了社会媒体的高度关注。微博评论内容是用户对疫情中介性事件的认知、态度、倾向和行为的汇集,为基于用户情感分析的舆情演化研究提供了高现势性和高时序性的文本语料。本文以2020年1月23日至4月8日期间“人民日报”每日疫情通报的微博评论为信息基底,首先使用中文自然语言处理工具SnowNLP对语料进行情感倾向性抽取,完成正负向的情感分类,然后基于Single-Pass聚类算法实现文本语料的聚类分析,探索疫情热点话题,最后利用Louvain社团发现算法实现舆情被关注度的信息挖掘。① 时间维度上,每日情感趋势表明用户经历了焦虑害怕(1月24日—2月18日)、平稳自信(2月19日—3月15日)和紧张担忧(3月16日—4月8日)的情感更迭阶段;② 空间维度上,用户参与数量、所在地情绪状态和评论地情绪投射等关联分析显示不同行政区的疫情关注度和情感状态存在明显差异,疫情越严重地区的微博用户,其参与度越高且情绪状态与投射值越低。该研究通过引入自然语言处理技术和社团网络算法,构建出一种面向社交媒体评论文本数据的舆情分析方法框架,为重大公共事件的舆情研究提供了理论支持和创新思路。
张琛 , 马祥元 , 周扬 , 郭仁忠 . 基于用户情感变化的新冠疫情舆情演变分析[J]. 地球信息科学学报, 2021 , 23(2) : 341 -350 . DOI: 10.12082/dqxxkx.2021.200248
As a Public Health Emergency of International Concern (PHEIC), the COVID-19 pandemic caused great concern in social media all over the world. The content of Weibo comments is a collection of users' perceptions, attitudes, tendencies, and behaviors of the pandemic, and provides a high-timeliness and high-sequence text corpus for public opinion evolution research based on sentiment analysis. In this paper, we used a corpus obtained from People's Daily on Weibo during COVID-19 pandemic (January 23 - April 8, 2020) as our research data. First, we extracted emotional tendencies to classify text comments into positive and negative sentiments with SnowNLP, a Chinese natural language processing tool. Second, based on the Single-Pass clustering algorithm, we implemented text cluster analysis to explore hot topics about the pandemic situation. Moreover, we realized the information mining about public attention by using the Louvain community analysis algorithm. (1) On temporal dimension, the result of daily emotional trend analysis shows that the public has experienced three emotional phases, which are a period presenting anxiety and fear (January 23 - February 18), a period presenting steadiness and confidence (February 19 - March 15) and a period presenting tension and concern (March 16 - April 8). (2) On a spatial dimension, joint analysis of the number of users, the emotional states, and emotional projections among different provinces shows obvious differences in the public attention and emotional value of the COVID-19 pandemic. Additionally, for those Weibo users in COVID-19 affected areas, the level of their online participation is positively correlated with the pandemic severity and the value of the emotional state and emotional projection is lower. Meanwhile, those in worst-hit areas tend to have a higher impact on the evolution of public opinion. The results show that Weibo users in Guangdong Province and Heilongjiang Province have high levels of attention and low averages of emotional state and emotional projection. It can be judged the two provinces are still facing great pressure for pandemic prevention and control. Although Hubei Province is most affected by the pandemic, with a low emotional state value but a high emotional projection value, it is speculated Weibo users' comments on Hubei Province are more encouraging and praised. In addition, the number of confirmed cases in the northwestern region is relatively small, and the number of comment participation is less than in other regions, but the averages of emotional state and emotional projection are higher. The research applies natural language processing and network community detection algorithms to construct a methodological framework of public opinion analysis for social media comments. The developed framework has promising potentials, as it provides theoretical and practical support for related research on major public events.
表1 “疫情通报”微博内容数据格式Tab. 1 Data format of Weibo content |
话题ID | 日期 | 转发/次 | 评论/次 | 点赞/次 | 微博内容 |
---|---|---|---|---|---|
4464171633076141 | 2020-01-24 | 3146 | 10 028 | 188 687 | #全国确诊新型肺炎病例#【#29省累计新冠肺炎确诊病例830例#】1月23日0—24时…… |
4464534334753932 | 2020-01-25 | 16 249 | 75 005 | 184 301 | 全国确诊新型肺炎病例#【#全国新增444例新型肺炎确诊病例# 累计确诊1287例】国家卫健委通报,1月24日0—24时…… |
4468152664621792 | 2020-02-04 | 894 | 1563 | 8871 | 【#全国新增确诊3235例#,#全国累计确诊20 438例#】2月3日0—24时…… |
表2 “疫情通报”微博评论数据格式Tab. 2 Data format of Weibo comments |
评论者ID | 性别 | 地址 | 日期 | 评论内容示例 |
---|---|---|---|---|
1939099823 | m | 其他 | 2020-01-30 | 感谢奋战在疫区一线的医务人员,人民警察,防疫人员,人民子弟兵,夜以继日工作的火神山医院工程建设者 |
6996680055 | f | 上海 | 2020-01-30 | 没办法复工了损失惨重 |
6047870405 | f | 四川 | 2020-02-07 | 快点儿好起来中国加油 |
5470355309 | m | 海南 | 2020-03-03 | 外防输入,内防扩散 |
5492496027 | f | 广东 | 2020-04-05 | 怎么新增这么多 |
[1] |
|
[2] |
冯明翔, 方志祥, 路雄博, 等. 交通分析区尺度上的COVID-19时空扩散推估方法——以武汉市为例[J]. 武汉大学学报·信息科学版, 2020,45(5):651-657,681.
[
|
[3] |
|
[4] |
|
[5] |
王艳东, 李昊, 王腾, 等. 基于社交媒体的突发事件应急信息挖掘与分析[J]. 武汉大学学报·信息科学版, 2016,41(3):290-297.
[
|
[6] |
|
[7] |
黄发良, 冯时, 王大玲, 等. 基于多特征融合的微博主题情感挖掘[J]. 计算机学报, 2017,40(4):872-888.
[
|
[8] |
陈梓, 高涛, 罗年学, 等. 反映自然灾害时空分布的社交媒体有效性探讨[J]. 测绘科学, 2017,42(8):44-48,129.
[
|
[9] |
黄晓斌, 赵超. 文本挖掘在网络舆情信息分析中的应用[J]. 情报科学, 2009,27(1):94-99.
[
|
[10] |
周艳, 李妍羲, 黄悦莹, 等. 基于社交媒体数据的城市人群分类与活动特征分析[J]. 地球信息科学学报, 2017,19(9):1238-1244.
[
|
[11] |
苏凯, 程昌秀,
[
|
[12] |
|
[13] |
|
[14] |
王宏俐, 李王莹, 刘书凝, 等. “英国脱欧”社交网络舆情分析与启示[J]. 情报杂志, 2020,39(4):98-103.
[
|
[15] |
吴娱. 网络舆情分析关键技术研究与实现[D]. 成都:电子科技大学, 2011.
[
|
[16] |
裴韬, 郭思慧, 袁烨城, 等. 面向公共安全事件的网络文本大数据结构化研究[J]. 地球信息科学学报, 2019,21(1):2-13.
[
|
[17] |
|
[18] |
王敬泉, 王凯. 基于GIS的突发事件网络舆情传播可视化探究[J]. 测绘通报, 2019(12):142-146.
[
|
[19] |
陈兴蜀, 常天祐, 王海舟, 等. 基于微博数据的“新冠肺炎疫情”舆情演化时空分析[J]. 四川大学学报(自然科学版), 2020,57(2):409-416.
[
|
[20] |
|
[21] |
曹彦波. 基于新浪微博的2018年云南通海5.0级地震舆情时空特征分析[J]. 地震研究, 2018,41(4):525-533.
[
|
[22] |
谌志群, 鞠婷. 基于BERT和双向LSTM的微博评论倾向性分析研究[J/OL]. 情报理论与实践:1-7( 2020-04-13).
[
|
[23] |
周中华, 张惠然, 谢江. 基于Python的新浪微博数据爬虫[J]. 计算机应用, 2014,34(11):3131-3134.
[
|
[24] |
Weibo: People's Daily [EB/OL]. https://weibo.com/rmrb.
|
[25] |
|
[26] |
|
[27] |
林志萍, 王丽萍, 余斌, 等. 抗击新型冠状病毒肺炎疫情期间一线防疫人员不良情绪反应及其影响因素分析[J]. 中国公共卫生, 2020,36(5):677-681.
[
|
[28] |
张岩, 李英冰, 郑翔. 基于微博数据的台风“山竹”舆情演化时空分析[J/OL]. 山东大学学报(工学版): 1-9(2020-02-22).
[
|
[29] |
赵润乾, 吴渝, 陈昕. 大规模社交网络社区发现及可视化算法[J]. 计算机辅助设计与图形学学报, 2017,29(2):328-336.
[
|
[30] |
|
[31] |
|
/
〈 | 〉 |