地球信息科学学报 ›› 2023, Vol. 25 ›› Issue (6): 1176-1185.doi: 10.12082/dqxxkx.2023.230034

• 专刊:地理时空知识图谱理论方法与应用 • 上一篇    下一篇

M2T多源知识图谱融合的空间场景描述文本自动生成框架

陈晖萱1,2(), 郭旦怀3,*(), 葛世寅1,2, 王婧1, 王彦棡1,2, 陈峰4, 杨微石5,6   

  1. 1.中国科学院计算机网络信息中心,北京 100190
    2.中国科学院大学,北京 100049
    3.北京化工大学 信息科学与技术学院 时空数据智能实验室 北京 100029
    4.亚利桑那大学东亚系,美国图森市 85721-0105
    5.北京大学城市规划与设计学院,深圳 518055
    6.深圳市自然资源和不动产评估发展研究中心,深圳 518034
  • 收稿日期:2023-01-28 修回日期:2023-04-01 出版日期:2023-06-25 发布日期:2023-06-02
  • 通讯作者: *郭旦怀(1973— ),男,江西南康人,博士,教授,主要从事地理人工智能理论与应用研究。E-mail: gdh@buct.edu.cn
  • 作者简介:陈晖萱(1999— ),女,山西临汾人,硕士生,主要从事地理人工智能应用研究。E-mail: hxchen@cnic.cn
  • 基金资助:
    国家自然科学基金项目(41971366);国家自然科学基金项目(91846301);中央高校基本科研业务费项目(BUCTRC:202132)

M2T: A Framework of Spatial Scene Description Text Generation based on Multi-source Knowledge Graph Fusion

CHEN Huixuan1,2(), GUO Danhuai3,*(), GE Shiyin1,2, WANG Jing1, WANG Yangang1,2, CHEN Feng4, YANG Weishi5,6   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049,China
    3. Spatial Temporal Data Intelligence Research Lab, College of Information Sciences and technology, Beijing University of Chemical Technology, Beijing 100029, China
    4. Department of East Asian Studies, The University of Arizona, Tucson 85721-0105, USA
    5. School of Urban Planning and Design, Peking University, Shenzhen 518055, China
    6. Center for Shenzhen Natural Resources and Real Estate Evaluation and Development Research, Shenzhen 518034, China
  • Received:2023-01-28 Revised:2023-04-01 Online:2023-06-25 Published:2023-06-02
  • Contact: *GUO Danhuai, E-mail: gdh@buct.edu.cn
  • Supported by:
    National Natural Science Foundation of China(41971366);National Natural Science Foundation of China(91846301);Fundamental Research Funds for the Central Universities(BUCTRC:202132)

摘要:

面向自然语言的地理空间场景描述一直是地理信息科学的重要研究方向,传统方法更注重空间关系的遍历性描述,难以融合人类空间认知,与人类自然语言有较大的差距。地理空间场景自然语言描述的本质是地理空间二维向量转换词空间一维向量的过程。本文提出M2T空间场景自然语言表达框架,通过空间场景理解、语言合成和注意力感知3个知识图谱,在多源知识图谱的融合机制下,生成自然语言空间场景描述文本。其中空间场景描述知识图谱解决遍历空间关系剪枝难题,同时通过建立空间关系图谱建立空间场景之间关联,支持空间场景连续表达;自然语言风格知识图谱建立空间表达与语言风格的关联,实现了贴切于空间自然语言表达的多样化语言风格;空间关注度知识图谱根据空间场景主体和客体交互状态,建立注意力矩阵捕捉自然语言空间表达的细微之处。以北京故宫为例设计的原型系统,实验表明系统生成结果与人类游记接近,且内容覆盖更完整,风格更多样,验证了M2T框架的有效性,并展现了空间场景自然语言描述应用的潜在价值。

关键词: 空间场景描述, 地理知识图谱, 自然语言生成, 空间认知, 空间注意力, 空间表达, 空间理解

Abstract:

Natural language is an effective tool for humans to describe things, with diversity and ease of dissemination, and can contain human spatial cognitive results. How to use natural language to describe geographic spatial scenes has always been an important research direction in spatial cognition and geographic information science, providing important application values in personalized unmanned tour guides, blind navigation, virtual space scene interpretation, and so on. The essence of natural language description of geographic spatial scenes is the process of transforming the two-dimensional vector of geographic space into a one-dimensional vector in word space. Traditional models perform well in handling spatial relationships, but are somewhat inadequate in natural language description: (1) spatial relationship description models are one-way descriptions of the environment by humans, without considering the impact of the environment on the description; (2) spatial scenes emphasize traversal-based descriptions of spatial relationships, where each set of spatial relationships is equally weighted, which is inconsistent with the varying attention paid by humans to geographic entities and spatial relationships in the environment; (3) the spatial relationship calculation of traditional models is a static description of a single scene, which is difficult to meet the requirement of dynamic description of continuous scenes in practical applications; (4) the natural language style of traditional models is mechanical, lacking necessary knowledge support. This article proposes a spatial scene natural language generation framework Map2Text (M2T) that integrates multiple knowledge graphs. The framework establishes knowledge graphs for spatial relationships, language generation style, and spatial attention, respectively, and realizes the fusion of multiple knowledge graphs and the generation of natural language descriptions of spatial scenes within a unified framework. The spatial scene description knowledge graph solves the pruning problem of traversing spatial relationships, and establishes the relationship between spatial scenes by building a spatial relationship graph, supporting continuous expression of spatial scenes; the natural language style knowledge graph establishes the relationship between spatial expression and language style, achieving diversified language styles that are appropriate for spatial natural language expression; the spatial attention knowledge graph captures the nuances of natural language spatial expression by establishing an attention matrix based on the interaction state between the subject and object of the spatial scene. An experimental prototype system designed based on the Beijing Forbidden City demonstrates that the system-generated results are close to human travel notes, with more complete content coverage and more diverse styles, verifying the effectiveness of the M2T framework and demonstrating the potential value of natural language description of spatial scenes.

Key words: spatial scene description, geographic knowledge graph, natural language generation, spatial cognition, spatial attention, spatial expression, spatial understanding