地球信息科学学报 ›› 2018, Vol. 20 ›› Issue (7): 978-987.doi: 10.12082/dqxxkx.2018.170430

• 地理空间分析综合应用 • 上一篇    下一篇

北京主城区伪基站时空规律分析

汪伟(), 陶海燕*(), 卓莉, 李敏, 李旭亮, 汪珂丽, 史清丽   

  1. 中山大学地理科学与规划学院 广东省城市化与地理环境空间模拟重点实验室/综合地理信息研究中心, 广州 510275
  • 收稿日期:2017-09-17 修回日期:2018-01-22 出版日期:2018-07-20 发布日期:2018-07-13
  • 作者简介:

    作者简介:汪 伟(1996-),男,安徽安庆人,本科生,主要从事时空数据挖掘。E-mail: wangw227@mail2.sysu.edu.cn

  • 基金资助:
    国家自然科学基金项目(41371499);广东省自然科学基金团队项目(2014A030312010)

Spatio-temporal Analysis of Pseudo Base Stations in Beijing Downtown

WANG Wei(), TAO Haiyan*(), ZHUO Li, LI Min, LI Xuliang, WANG Keli, SHI Qingli   

  1. Guangdong Provincial Key Laboratory of Urbanization and Geo-simulation/ Center of Integrated Geographic Information Analysis, School of Geography and Planning, Sun Yat-sen University, Guangzhou 510275, China
  • Received:2017-09-17 Revised:2018-01-22 Online:2018-07-20 Published:2018-07-13
  • Contact: TAO Haiyan
  • Supported by:
    National Natural Science Foundation of China, No.41371499;Guangdong Province Natural Science Foundation research team project, No.2014A030312010

摘要:

随着公众移动通信的快速发展,伪基站的泛滥不仅破坏正常电信秩序,危害公共安全,而且严重损害群众财产权益,侵犯公民个人隐私,已成为社会一大公害。如何从垃圾短信大数据中挖掘出伪基站活动的时空规律,寻找有效的防控方案,从源头上进行打击和治理成为管理部门和研究者共同关注的焦点。本文基于北京市垃圾短信数据,利用非负矩阵分解的方法分析伪基站的时空分布规律;并利用TF-IDF构建垃圾短信分类模型,对垃圾短信进行分类,结合土地利用数据,分析伪基站在发送不同类型垃圾短信时的时空分布规律。结果显示:北京市垃圾短信多分布于路网和中心城区;白天垃圾短信数量远远多于晚上;垃圾短信的分布随时间的推移沿着路网逐渐向内收缩;发送不同类型垃圾短信的伪基站的时空分布具有一定的差异;通过非负矩阵分解得到的结果,与垃圾短信分类后得到的结果有很好的匹配。研究表明,非负矩阵分解具有实现上的简便性、分解形式和分解结果上的可解释性等优点,可以有针对性的为有关部门建言打击伪基站的有效方案,对于伪基站违法行为的治理具有一定的意义。

关键词: 非负矩阵分解, 垃圾短信, 伪基站, 时空规律, 北京

Abstract:

The rampant pseudo base stations have become a major public hazard. They undermine the normal telecommunications order, endanger public safety, seriously infringe the property rights of the masses, and violate citizen privacy. How to dig out the spatio-temporal patterns of the pseudo base stations’ activities from massive spam messages, design effective prevention and control programs, and fight against the crime from the source, has become the focus of government agencies and researchers. The traditional methods for identifying pseudo base stations through the user terminal, however, face great challenges in terms of accuracy, comprehensiveness, and analytical ability, which no longer meet the requirements of identifying small-scale and mobile pseudo base stations. Utilizing data on the spam messages from February 23rd, 2017 to April 26th, 2017 in Beijing, this paper analyzes the spatio-temporal distribution of pseudo base stations through non-negative matrix factorization. We also constructed a classification model through TF-IDF (Term Frequency-Inverse Document Frequency) which compares types from different classifiers (k-Nearest Neighbors / K-Support Vector Machine /Random Forest/ Single-Layer Neural Network) and selects the most accurate random forest classification method. Combined with the land use data, we analyzed the spatio-temporal distribution of pseudo base stations that send different types of spam messages. The results of non-negative matrix factorization and spam message classification were analyzed in detail. The results show that most of the spam messages in Beijing are sent along the road network and in the central city. The number of spam messages during the day is much more than that during the evening. As time goes by in the day, the distribution of spam messages along the road network gradually shrinks inward. The pseudo base stations that send different types of spam messages differ in the spatio-temporal distribution, but all of them favor the traffic facilities and residential area within the Fourth Ring. The non-negative matrix factorization, which provides reliable results that match with traditional spam message classification, has shown simplicity in performing the analysis and interpretability in the form and result of the decomposition. It can help understand the spatio-temporal patterns of different types of spam messages and provide evident-based suggestions for government agencies to fight against the pseudo base stations effectively. By targeting the source of the spam messages, it is also beneficial for governments to combat the illegal behaviors based on pseudo base stations.

Key words: visualization analysis, non-negative matrix factorization, spam messages, spatio-temporal patterns, Beijing