地球信息科学学报 ›› 2021, Vol. 23 ›› Issue (10): 1778-1786.doi: 10.12082/dqxxkx.2021.210025

• 地球信息科学理论与方法 • 上一篇    下一篇

基于过滤文本和社交网络的用户常驻位置预测

王海起(), 孔浩然*(), 李学伟   

  1. 中国石油大学(华东)海洋与空间信息学院,青岛 266580
  • 收稿日期:2021-01-17 修回日期:2021-03-19 出版日期:2021-10-25 发布日期:2021-12-25
  • 通讯作者: * 孔浩然(1997— ),男,山东济宁人,硕士生,主要从事地理命名实体识别研究。E-mail: konghr_upc@163.com
  • 作者简介:王海起(1972— ),男,河南南阳人,博士,副教授,主要从事空间、时空、文本大数据挖掘算法与应用,地理信息与机器学习,空间和时空统计分析研究。E-mail: wanghaiqi@upc.edu.cn
  • 基金资助:
    国家自然科学基金项目(41471322)

User's Home Location Prediction based on Filtered Text and Social Networks

WANG Haiqi(), KONG Haoran*(), LI Xuewei   

  1. College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China
  • Received:2021-01-17 Revised:2021-03-19 Online:2021-10-25 Published:2021-12-25
  • Supported by:
    National Natural Science Foundation of China(41471322)

摘要:

社交用户的文本具有地理差异性,并且社交关系密切的用户之间居住位置更近,因而文本和社交网络均可用于推断用户常驻位置。现有基于文本和社交网络的用户常驻位置预测方法对文本的位置指示性特征挖掘不充分,而用户文本中地名等位置指示信息却提供了最有用的位置信号。因此,本文提出一种基于地理命名实体识别(GER)和图卷积神经网络(GCN)的社交用户位置预测方法。首先,通过地理命名实体识别方法对用户文本进行过滤以凸显位置指示性特征;其次,基于提及关系和关注与被关注关系抽取社交网络;再次,结合社交网络和用户文本内容,采用基于图卷积神经网络的方法进行用户常驻位置预测;最后,将GER-GCN与GCN以及最新研究成果进行比较,并探究该模型的小样本学习能力及其影响因素。基于Geotext数据集和2个微博数据集的实验表明:① GER文本过滤方法可显著提升用户位置预测精度;② 在所有实验中,GER-GCN的预测精度最高,并在基准数据集GeoText上比最新研究成果提升1%~2%;③ 在最小监督的现实场景中,本文印证了GER-GCN模型的小样本学习能力,并发现社交网络质量对其小样本学习能力起到决定性作用。实验结果验证了GER-GCN方法的先进性,且该方法符合社交媒体现实场景的应用需求。

关键词: 社交用户, 常驻位置, 地名, 社交网络, 多视图, 地理命名实体识别, 图卷积神经网络, 小样本学习

Abstract:

The home locations of social media users are essential for a wide range of applications in real-world. The social media text published by users from different regions possesses quite a few differences in expression mode, semantics, and other contents. In general, users with close social relationships live closer to each other. Therefore, both text and social network can be used to infer the home locations of users. The existing user’s home location prediction methods based on social network and text are not sufficient to mine the location indicative features in user text, while the location indicative information such as toponym in text provides the most useful location signals. Therefore, we proposed a location prediction method for social media users based on Geographic Entity Recognition (GER) and Graph Convolutional Network (GCN). Firstly, the user text was filtered by the geographic entity recognition method to highlight the location indicative words. Then, the social networks were extracted based on mentioned relationships and following relationships. After that, we combined social network and user text content that contains location indicative words. The method based on graph convolutional network was used to predict the user's home location. Finally, we compared the GER-GCN method with the GCN method and the latest research results, and explored the small sample learning ability of the model and its influencing factors. Experiment results based on the GeoText dataset and two datasets of microblog show that, firstly, GER text filtering method can significantly improve the accuracy of user location prediction. The improvement effect of this method is more significant for the dataset with more microblogs of users, which indicates that the GER text filtering method is more suitable for the social media dataset with more microblogs of users. Secondly, in the experiments of different datasets, the prediction accuracy of GER-GCN method is invariably the highest among all methods. In the experiment of GeoText benchmark dataset, the prediction accuracy of GER-GCN method is 1.03% and 1.87% higher than that of GCN and MENET methods, respectively, which indicates that the GER-GCN method is more competitive than the latest research results. Thirdly, in a realistic scenario with minimal supervision, we confirm that the GER-GCN model possesses a certain small sample learning ability, and find that the quality of social networks plays a decisive role in its small sample learning ability. The experimental results demonstrate the excellent performance of the GER-GCN method, and the method is in line with the application requirements of social media in the realistic scenarios.

Key words: social users, home location, Toponym, social networks, multi-view, geographic entity recognition, graph convolutional network, small sample learning