地球信息科学学报 ›› 2019, Vol. 21 ›› Issue (1): 2-13.doi: 10.12082/dqxxkx.2019.180680

所属专题: 地理大数据

• 地理大数据时空模式挖掘的方法与应用研究 • 上一篇    下一篇

面向公共安全事件的网络文本大数据结构化研究

裴韬1,2,7(), 郭思慧1,2, 袁烨城1,*(), 张雪英3,7, 袁文1, 高昂4, 赵志远5, 薛存金6   

  1. 1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
    2. 中国科学院大学,北京 100049
    3. 南京师范大学虚拟地理环境教育部重点实验室,南京 210023
    4. 中国标准化研究院,北京 100088
    5. 武汉大学测绘遥感信息工程国家重点实验室,武汉 430079
    6. 中国科学院遥感与数字地球研究所,北京 100094
    7. 江苏省地理信息资源开发与利用协同创新中心,南京 210023
  • 收稿日期:2018-12-01 修回日期:2018-12-24 出版日期:2019-01-20 发布日期:2019-01-20
  • 通讯作者: 袁烨城 E-mail:peit@lreis.ac.cn;yuanyc@lreis.ac.cn
  • 作者简介:

    作者简介:裴 韬(1972-),男,江苏扬州人,研究员,研究方向为地理大数据挖掘。E-mail:peit@lreis.ac.cn

  • 基金资助:
    国家自然科学基金项目(41525004、 41421001)

Public Security Event Themed Web Text Structuring

Tao PEI1,2,7(), Sihui GUO1,2, Yecheng YUAN1,*(), Xueying ZHANG3,7, Wen YUAN1, Ang GAO4, Zhiyuan ZHAO5, Cunjin XUE6   

  1. 1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing 210023, China
    4. China National Institute of Standardization, Beijing 100088, China
    5. State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing of Wuhan University, Wuhan 430079, China
    6. Key Laboratory of Digital Earth Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, China
    7. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
  • Received:2018-12-01 Revised:2018-12-24 Online:2019-01-20 Published:2019-01-20
  • Contact: Yecheng YUAN E-mail:peit@lreis.ac.cn;yuanyc@lreis.ac.cn
  • Supported by:
    National Natural Science Foundation of China, No.41525004,41421001

摘要:

网络文本中所包含的相关信息目前已成为公共安全事件紧急救援与影响评估的重要信息源。现有的方法虽然可定向地提取文本信息中事件的各类要素信息,但由于缺乏面向事件的整体建模与解析框架,难以从网络文本中获取系统的事件要素的结构化信息,即所提取的事件要素信息要么不够完整,要么与目标事件不匹配,由此产生的遗漏与谬误难以支撑针对公共安全事件信息的系统分析。为解决该问题,本文提出了面向公共安全事件的网络文本大数据结构化理论框架,首先,建立了公共安全事件的语义框架,并以地震事件为例构建了相应的结构化表结构;其次,应用训练语料的关联标注解决了事件要素与事件无法匹配的难点;最后,通过使用可融合关联信息的文本解析算法,系统提取了事件类型、事件名称、事件时间、事件位置及其他属性,基本实现了网络文本中不同事件信息的结构化。本文以云南邵通鲁甸地震为例,展示了地震事件的网络文本信息的结构化过程与结果,为分析地震所受的关注程度以及救援状况提供了重要参考。在上述研究的基础上,开发了面向公共安全事件的网络文本信息挖掘系统,展示了地震事件文本的结构化解析以及由此实施的事件关注度分析。

关键词: 语义框架, 文本解析, 事件关注度, 地震事件, 空间搜索引擎

Abstract:

The information of public security event contained in text can be the data source of the evaluation and the relief if it can be structured into a relational database. Although previous research can extract the information of events into different attributes, the determination on the attribution of the attribute information to specific event remains unsolved. To solve the problem, this paper proposes a theoretical frame of public security event themed web text structuring, which is composed of three parts. First, an event semantic model is used to construct the seismic event semantic framework which defines abstract elements of event and their semantic relationships. Taking seismicity as an example, spatial element, time element, attribute element, source element are defined as basic elements. Spatial element includes earthquake latitude, longitude, depth and location. Attribute element is further subdivided into four sub-elements: Cause, result, behavior and influence element. Next, an annotation system is applied to typical event materials to label semantic elements, e.g. the place name where an earthquake took place, that is, instantiation of the abstract elements. The key to this step is labeling the relations between elements and specific event. Finally, the event text is structured into event type, event name, event time, event location and other attributes by using the text information extraction algorithm. The algorithm used the labeled materials in the last step as training data to optimize parameters, which can incorporate linked information. The extracted event text (e.g. words, phrases) finally is normalized to structured information for further analysis. An event information mining platform following the whole frame is developed, which includes the modules of webpage searching, text cleaning, event information extraction, visualization and analyzing. The platform processed the whole Chinese webpages of 2014 and found 85 506 seismicity reports. Taking Yunnanludian earthquake as an example, we display the structuring process and result of related web text, which can be the important reference for the relief of the disaster and the analysis of public concern. With the platform, we can demonstrate the seismic text structuring result and its social concern across China, which can be a new tool of event information mining and analyzing.

Key words: semantic framework, text parsing, social concern about events, seismic events, spatial search engine