地球信息科学学报 ›› 2017, Vol. 19 ›› Issue (11): 1405-1414.doi: 10.3724/SP.J.1047.2017.01405

• 地球信息科学理论与方法 •    下一篇

利用社交媒体的位置潜语义特征提取与分析

陈瑗瑗(), 高勇*()   

  1. 北京大学遥感与地理信息系统研究所,北京 100871
  • 收稿日期:2017-07-04 修回日期:2017-09-07 出版日期:2017-11-10 发布日期:2017-11-10
  • 通讯作者: 高勇 E-mail:ygrittechen@pku.edu.cn;gaoyong@pku.edu.cn
  • 作者简介:

    作者简介:陈瑗瑗(1994-),女,硕士生,研究方向为空间数据挖掘。E-mail: ygrittechen@pku.edu.cn

  • 基金资助:
    国家自然科学基金项目(41625003)

Extracting and Analyzing Latent Semantic Characteristics of Locations Using Social Media Data

CHEN Yuanyuan(), GAO Yong*()   

  1. Institute of Remote Sensing and Geographic Information System, Peking University, Beijing 100871, China
  • Received:2017-07-04 Revised:2017-09-07 Online:2017-11-10 Published:2017-11-10
  • Contact: GAO Yong E-mail:ygrittechen@pku.edu.cn;gaoyong@pku.edu.cn

摘要:

社交媒体及时、大量、广泛地记录了城市中居民的观点和情感,尤其是具有位置标记的签到文本,将人们所处的空间和城市设施与其相应的认知态度结合起来,成为以人为核心主导的对空间位置特征的直接表达,是场所语义信息的集中体现。以微博签到数据为研究对象,引入自然语言处理领域的潜语义分析的方法,结合空间分析中因子分析、空间自相关分析和聚类分析的手段,提取并分析其中隐含的位置语义特征。本研究主要侧重于对位置之间语义相关程度的度量,首先提取研究区域隐含的概念主题结构,分析不同主题在空间上的分布特征。然后对特定地块进行潜语义空间上的相似性索引,在此基础上,采用先验的百度百科词条描述对位置间语义相似性进行扩展,通过空间自相关的分析,得到不同功能类型的热点区域。

关键词: 位置语义, 社交媒体, 潜语义分析, 场所感知

Abstract:

Social media data are increasingly perceived as an important channel to record people’s perception by virtue of its large volume, availability and timeliness. Especially, some social media data are location-stamped, associating with the space in the city with human cognition. Thus, we can further manifest the sociocultural signature of places in a semantic way. In this paper, geo-tagged text data on Weibo were utilized to explore the hidden semantic characteristics of locations, with focus on semantic similarities among regions. Specifically, Latent Semantic Analysis (LSA) were introduced to transform the unstructured regional and semantic feature in social media into a cognition-friendly and deep-related vector. Then, spatial analysis method, including factor analysis, spatial correlation analysis and clustering analysis were employed to mining the hidden characteristics of locations. In terms of research results, different latent topics and their distribution across the city were uncovered. Similarity index of tested locations were then obtained by measuring their latent semantic features. Baidu-pedia entries were further used as empirical consensus and spatial autocorrelation analysis was employed to investigate urban functional hot-regions. Besides, spatial clusters were acquired by using K-MEANS method in latent semantic space. Its effectiveness was validated by the diversity of POI density among clusters. This study demonstrates how the semantic meaning of a space can be harvested through the analysis of crowd-generated content in social media, which is useful to capture the unique themes that shape a location and support urban planning.

Key words: location semantics, social media, latent semantic analysis, place sensing