地球信息科学学报 ›› 2019, Vol. 21 ›› Issue (10): 1510-1517.doi: 10.12082/dqxxkx.2019.190037

• 地球信息科学理论与方法 • 上一篇    下一篇

基于联合主题特征的网络新闻文本蕴含环境污染事件检测

黄宗财1,2,仇培元3,陆锋2,3,吴升1,2,*()   

  1. 1. 福州大学数字中国研究院(福州),福州 350002
    2. 海西政务大数据应用协同创新中心,福州 350002
    3. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101;
  • 收稿日期:2019-01-21 修回日期:2019-06-21 出版日期:2019-10-25 发布日期:2019-10-29
  • 通讯作者: 吴升 E-mail:ws0110@163.com
  • 作者简介:黄宗财(1992-),男,江西兴国人,硕士生,主要研究方向为地理信息抽取、时空数据挖掘与可视化研究。E-mail: 1262686237@qq.com
  • 基金资助:
    国家自然科学基金重点项目(41631177);数字福建建设项目(闽发改网数字函)([2014]191号);数字福建建设项目(闽发改网数字函)([2016]23号);数字福建建设项目(闽发改网数字函)([2016]77号);福建省科技创新平台项目(2015H2001)

Detection of Environmental Pollution Events in News Corpora based on Joint Thematic Features

HUANG Zongcai1,2,QIU Peiyuan3,LU Feng2,3,WU Sheng1,2,*()   

  1. 1. Digital China Research Institute of Fuzhou University (Fujian), Fuzhou 350002, China
    2. Fujian Collaborative Innovation Center for Big Data Applications in Governments, Fuzhou 350002, China
    3. State Key Lab of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China;
  • Received:2019-01-21 Revised:2019-06-21 Online:2019-10-25 Published:2019-10-29
  • Contact: WU Sheng E-mail:ws0110@163.com
  • Supported by:
    National Natural Science Foundation of China(41631177);Digital Fujian Construction Project([2014]191号);Digital Fujian Construction Project([2016]23号);Digital Fujian Construction Project([2016]77号);Fujian Science and Technology Innovation Platform Project(2015H2001)

摘要:

网络新闻文本在环境污染事件感知方面具有重要的应用价值。然而,由于环境污染事件的“多米诺效应”,网络新闻文本往往存在对多类型污染事件的混合描述,现有事件检测方法容易导致文本分类错误。本文提出一种基于联合主题特征的网络新闻文本蕴含环境污染事件检测方法,通过兼顾环境网络新闻文本的全局特征和主题分布特征来改善检测分类效果。该方法采用词频-逆文档频率向量对文档进行全局特征表示,并结合文档的主题分布特征向量,构建联合主题特征向量作为监督分类模型的输入,实现环境污染事件检测。实验结果表明,使用联合主题特征的支持向量机方法进行事件类别检测平均F1值相较于全局特征提高15%,相较于主题特征提高36%。本文提出的网络新闻文本蕴含环境污染事件检测方法可支持污染事件类型检测和影响信息抽取,有助于环境污染事件的时空统计与变化趋势预测。

关键词: 网络新闻文本, 事件检测, 环境污染事件, 联合主题特征向量, 词频-逆文档频率向量, 支持向量机

Abstract:

News have important application value in especially detecting environmental pollution event perceptions. However, due to the "domino effect" of environmental pollution incidents, news corpora often have mixed descriptions of multiple types of pollution incidents, and existing event detection methods easily lead to text classification errors. This paper proposed a new method for detecting environmental pollution events in news corpora based on joint theme features, which accounts for the global features and theme distribution characteristics. In this method, a joint topic feature vector,which combines TF-IDF (Term Frequency-Inverse Document Frequency) and theme distribution feature vector of the document, is constructed as the input of the supervised classification model to detect environmental pollution events. Using joint topic feature vector as the input of SVM (Support Vector Machine) method, the experimental results show that the average F1 value of event classification detection was 15% higher than that of global feature and 36% higher than that of topic feature.Our findings suggest that the proposed method supports the detection of pollution event types and the extraction of information and helps reveal their spatiotemporal statistical characteristics and the temporal trends.

Key words: news corpora, event detection, environmental pollution event, joint thematic features vector, TF-IDF, SVM