地球信息科学学报 ›› 2017, Vol. 19 ›› Issue (9): 1238-1244.doi: 10.3724/SP.J.1047.2017.01238

• 全空间信息系统应用 • 上一篇    下一篇

基于社交媒体数据的城市人群分类与活动特征分析

周艳1,2(), 李妍羲1,*(), 黄悦莹1, 耿二辉1   

  1. 1. 电子科技大学 资源与环境学院,成都 611731
    2. 电子科技大学 大数据研究中心,成都 611731
  • 收稿日期:2017-04-30 修回日期:2017-07-20 出版日期:2017-10-09 发布日期:2017-10-09
  • 通讯作者: 李妍羲 E-mail:zhouyan_gis@uestc.edu.cn;liyanxi_gis@163.com
  • 作者简介:

    作者简介:周 艳(1976-),女,陕西西安人,博士,副教授,主要从事地理信息系统应用和空间大数据分析。E-mail: zhouyan_gis@uestc.edu.cn

  • 基金资助:
    国家重点研发计划资助项目(2016YFB0502300);国家自然科学基金项目(41471332、41571392);中央高校基本科研业务费专项资金资助(ZYGX2015J113)

Analysis of Classification Methods and Activity Characteristics of Urban Population based on Social Media Data

ZHOU Yan1,2(), LI Yanxi1,*(), HUANG Yueying1, GENG Erhui1   

  1. 1. School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu 611731, China
    2. Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 611731, China
  • Received:2017-04-30 Revised:2017-07-20 Online:2017-10-09 Published:2017-10-09
  • Contact: LI Yanxi E-mail:zhouyan_gis@uestc.edu.cn;liyanxi_gis@163.com

摘要:

空间信息技术已开始进入全空间信息系统发展阶段,即将空间信息系统的范畴从传统测绘空间扩展到宇宙空间、室内空间、微观空间等可量测空间。位置大数据不仅是全空间信息系统的重要研究对象之一,而且也成为了广域全空间中了解人们生活方式以及城市动态变化的一种有效途径。本文基于社交媒体数据中的位置签到数据,提出一种不同于传统以社会经济属性为依据的城市人群分类方法。首先利用签到数据的时间序列构造矩阵模型;然后,通过分析用户签到活动的时间特征,采用K-means聚类算法和K近邻算法(K-NN)识别出具有不同时空行为特征的城市人群(静态居民、动态居民、通勤者以及访问者);最后,本文根据得到的人群分类结果,通过分析不同类型人群的时空间行为特征,发现不同类型人群时空间行为的差异性与潜在规律性,从而为表征城市人群的组成结构及特征,研究城市时空结构提供一种新的视角。

关键词: 位置签到数据, 时间序列, 时空间行为, 城市人口分类

Abstract:

With the rapid development of spatial information technology, the concept of Pan-spatial Information System has been proposed. It extends the scope of spatial information system from the traditional mapping space to the space, interior space, microscopic space and other measurable space. Location data is one of the important research objects of Pan-spatial Information System and it has become a way of studying people's social life and urban dynamics. In this paper, we propose a new crowd classification method based on check-in data which is different from the traditional method based on socioeconomic attributes. Firstly, using the time series of check-in data, we build a matrix model. Then, we analyze the temporal characteristics of residents’ check-in activities. The analytical process starts from spatial-temporal profiles, learns the different behaviors, and returns annotated profiles. In the analytical process, we use the K-means clustering algorithm and K-NN algorithm to learn how to annotate profiles with a city user category (resident, dynamic resident, commuter, or visitor). Finally, according to the classification results of the population, we analyze the temporal and spatial behavior of different city user category and find their differences and potential regularity of spatial behavior. Our method can be applied to a new research perspective for characterizing the composition and characteristics of the urban population and studying urban spatiotemporal structure.

Key words: LBS checking-in data, time series, spatial-temporal activity, city users classification