Journal of Geo-information Science >
Spatio-temporal Analysis of Pseudo Base Stations in Beijing Downtown
Received date: 2017-09-17
Request revised date: 2018-01-22
Online published: 2018-07-13
Supported by
National Natural Science Foundation of China, No.41371499;Guangdong Province Natural Science Foundation research team project, No.2014A030312010
Copyright
The rampant pseudo base stations have become a major public hazard. They undermine the normal telecommunications order, endanger public safety, seriously infringe the property rights of the masses, and violate citizen privacy. How to dig out the spatio-temporal patterns of the pseudo base stations’ activities from massive spam messages, design effective prevention and control programs, and fight against the crime from the source, has become the focus of government agencies and researchers. The traditional methods for identifying pseudo base stations through the user terminal, however, face great challenges in terms of accuracy, comprehensiveness, and analytical ability, which no longer meet the requirements of identifying small-scale and mobile pseudo base stations. Utilizing data on the spam messages from February 23rd, 2017 to April 26th, 2017 in Beijing, this paper analyzes the spatio-temporal distribution of pseudo base stations through non-negative matrix factorization. We also constructed a classification model through TF-IDF (Term Frequency-Inverse Document Frequency) which compares types from different classifiers (k-Nearest Neighbors / K-Support Vector Machine /Random Forest/ Single-Layer Neural Network) and selects the most accurate random forest classification method. Combined with the land use data, we analyzed the spatio-temporal distribution of pseudo base stations that send different types of spam messages. The results of non-negative matrix factorization and spam message classification were analyzed in detail. The results show that most of the spam messages in Beijing are sent along the road network and in the central city. The number of spam messages during the day is much more than that during the evening. As time goes by in the day, the distribution of spam messages along the road network gradually shrinks inward. The pseudo base stations that send different types of spam messages differ in the spatio-temporal distribution, but all of them favor the traffic facilities and residential area within the Fourth Ring. The non-negative matrix factorization, which provides reliable results that match with traditional spam message classification, has shown simplicity in performing the analysis and interpretability in the form and result of the decomposition. It can help understand the spatio-temporal patterns of different types of spam messages and provide evident-based suggestions for government agencies to fight against the pseudo base stations effectively. By targeting the source of the spam messages, it is also beneficial for governments to combat the illegal behaviors based on pseudo base stations.
WANG Wei , TAO Haiyan , ZHUO Li , LI Min , LI Xuliang , WANG Keli , SHI Qingli . Spatio-temporal Analysis of Pseudo Base Stations in Beijing Downtown[J]. Journal of Geo-information Science, 2018 , 20(7) : 978 -987 . DOI: 10.12082/dqxxkx.2018.170430
Tab. 1 The field name and definition of the raw data表1 原始数据字段名称与含义 |
字段名称 | 字段含义 |
---|---|
phone | 伪基站伪装的发送方电话号码 |
content | 短信具体正文 |
md5 | 短信正文MD5 |
recitime | 垃圾短信接收时间戳 |
conntime | 与伪基站的连接时间戳 |
lng | 伪基站发送短信时的近似位置经度 |
lat | 伪基站发送短信时的近似位置纬度 |
Fig. 1 The study area: Beijing, China图1 研究区域 |
Tab. 2 The classification of spam messages表2 垃圾短信分类 |
大类名称 | 大类编号 | 小类名称 | 小类编号 |
---|---|---|---|
欺诈类 | 1 | 银行名义 | 1 |
运营商名义 | 2 | ||
其他 | 3 | ||
非法广告 | 2 | 违禁物品买卖 | 4 |
色情服务类 | 5 | ||
办假证假发票类 | 6 | ||
骚扰 | 3 | 恶意骚扰 | 7 |
轻度打扰 | 8 | ||
普通广告 | 4 | 房产中介类 | 9 |
金融理财 | 10 | ||
其他广告 | 11 |
Tab. 3 The classification result and its accuracy表3 分类结果及精度 |
分类器 | 指标 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | 平均 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RF | p | 0.98 | 0.95 | 0.12 | 1 | 0.98 | 0.99 | 0.98 | 0.5 | 0.97 | 0.98 | 0.91 | 0.85 |
r | 1 | 0.77 | 0.06 | 0.69 | 0.91 | 0.99 | 0.85 | 0.98 | 0.96 | 0.93 | 0.69 | 0.8 | |
F1 | 0.99 | 0.84 | 0.08 | 0.8 | 0.94 | 0.99 | 0.91 | 0.66 | 0.97 | 0.96 | 0.78 | 0.81 | |
KNN | p | 0.99 | 0.9 | 0.58 | 0.83 | 0.99 | 0.98 | 0.96 | 0.16 | 1 | 0.95 | 0.99 | 0.85 |
r | 0.98 | 0.51 | 0.3 | 0.27 | 0.88 | 0.98 | 0.7 | 0.67 | 0.73 | 0.68 | 0.36 | 0.64 | |
F1 | 0.98 | 0.65 | 0.39 | 0.39 | 0.93 | 0.98 | 0.8 | 0.25 | 0.84 | 0.79 | 0.52 | 0.69 | |
KSVM-linear | p | 0.99 | 0.92 | 0.52 | 0.99 | 0.98 | 0.98 | 1 | 0.41 | 0.98 | 0.99 | 0.89 | 0.88 |
r | 1 | 0.83 | 0.3 | 0.73 | 0.91 | 1 | 0.85 | 0.74 | 0.96 | 0.94 | 0.61 | 0.81 | |
F1 | 1 | 0.86 | 0.37 | 0.83 | 0.94 | 0.99 | 0.92 | 0.52 | 0.97 | 0.96 | 0.72 | 0.83 | |
nnet | p | 0.98 | 0.77 | 0.13 | 0.87 | 0.92 | 0.98 | 0.96 | 0.49 | 0.91 | 0.94 | 0.87 | 0.8 |
r | 0.99 | 0.79 | 0.1 | 0.68 | 0.89 | 0.99 | 0.89 | 0.59 | 0.97 | 0.95 | 0.6 | 0.77 | |
F1 | 0.99 | 0.77 | 0.11 | 0.74 | 0.9 | 0.99 | 0.92 | 0.47 | 0.94 | 0.94 | 0.7 | 0.77 |
Tab. 4 The accuracy index of the classification表4 分类评价指标精度 |
RF | KNN | KSVM-linear | nnet | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
准确率 | Kappa | P值 | 准确率 | Kappa | P值 | 准确率 | Kappa | P值 | 准确率 | Kappa | P值 | |||
0.95 | 0.93 | 0 | 0.86 | 0.82 | 0 | 0.94 | 0.92 | 0 | 0.93 | 0.91 | 0 |
Fig. 2 The flow chart for the spam messages classification model图2 垃圾短信分类模型流程图 |
Fig. 3 The spatial distribution of spam messages图3 垃圾短信空间分布图 |
Fig. 4 The temporal distribution of spam messages图4 垃圾短信时间分布图 |
Fig. 5 The temporal component of NMF图5 非负矩阵分解时间分量 |
Fig. 6 The spatial component of NMF图6 非负矩阵分解空间分量 |
Fig. 7 The proportion of spam messages by type图7 垃圾短信分类类型及比例分布 |
Fig. 8 The spatial distribution of spam messages by type图8 不同类型垃圾短信空间分布 |
Fig. 9 The land use map of Beijing within sixth ring图9 北京六环内土地利用图 |
Fig. 10 The spam message statistics by types of land use图10 各土地利用类型垃圾短信统计 |
Fig. 11 The sending area statistics by types of spam messages图11 各类型垃圾短信发送地区统计 |
Fig. 12 The temporal distribution of spam messages by type图12 不同类型短信随时间分布 |
The authors have declared that no competing interests exist.
[1] |
[
|
[2] |
|
[3] |
[
|
[4] |
[
|
[5] |
[
|
[6] |
[
|
[7] |
[
|
[8] |
|
[9] |
[
|
[10] |
[
|
[11] |
|
[12] |
[
|
[13] |
[
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
[
|
[19] |
[
|
[20] |
[
|
[21] |
|
/
〈 |
|
〉 |