地球信息科学学报 ›› 2015, Vol. 17 ›› Issue (2): 127-134.doi: 10.3724/SP.J.1047.2015.00127

• •    下一篇

网络文本蕴涵地理信息抽取:研究进展与展望

余丽1,2(), 陆锋1,*(), 张恒才1   

  1. 1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
    2. 中国科学院大学,北京 100101
  • 收稿日期:2014-08-28 修回日期:2014-10-29 出版日期:2015-02-10 发布日期:2015-02-10
  • 作者简介:

    作者简介:余丽(1986-),博士生,研究方向为互联网空间信息搜索。E-mail:yul@lreis.ac.cn

  • 基金资助:
    国家“863”计划项目(2012AA12A211、2013AA120305)

Extracting Geographic Information from Web Texts: Status and Development

YU Li1,2(), LU Feng1,*(), ZHANG Hengcai1   

  1. 1. State Key Lab of Resources and Environmental Information System, IGSNRR, CAS, Beijing 100101, China
    2. University of Chinese Academy of Sciences, Beijing 100101, China
  • Received:2014-08-28 Revised:2014-10-29 Online:2015-02-10 Published:2015-02-10
  • Contact: LU Feng
  • About author:

    *The author: CHEN Nan, E-mail:fjcn99@163.com

摘要:

互联网的普及产生了大量蕴含着丰富地理语义的文本,为地理信息的深度挖掘和知识发现带来了巨大机遇。同时,蕴含地理语义文本的异构性和动态性,使得地理实体的属性数量和种类激增、地理语义关系复杂,对地理信息检索、空间分析和推理、智能化位置服务等提出了严峻的挑战。本文阐述了网络文本蕴含地理信息抽取的技术流程,从地理实体识别、地理实体定位、地理实体属性抽取、地理实体关系构建、地理事件抽取5个方面总结了网络文本蕴含地理信息抽取的进展和关键技术瓶颈,分析了可用于网络文本蕴含地理信息抽取的开放资源,并展望了未来的发展方向。

关键词: 网络文本, 地理信息, 自然语言处理, 信息抽取, 地理定位

Abstract:

Internet generates a plenty of texts which contain abundant geographic semantic information, and bring massive opportunities for deep mining and knowledge discovery. Meanwhile, heterogeneous and dynamic web texts make a surge in the number and type of geographic entity's attributes and the complexity of geographic semantic relations, which present a unprecedented challenge to geographic information retrieval, spatial analysis and reasoning, and intelligent location based services. Firstly, we describe the process of extracting geopgraphic informantion from web texts, summarize the research status and major issues which include geographic entity recognition, locating, attribute extraction, relation construction and event extraction. Secondly, we introduce some popular open sources used for geographic information extraction. Lastly, we discuss and look ahead to the development trends of this domain in future.

Key words: web text, geographic information, natural language processing, infromation extraction, geographical location