GDELT数据网络化挖掘与国际关系分析
作者简介:秦 昆(1972-),男,湖北随州人,博士,教授,研究方向为时空数据挖掘与大数据分析。E-mail:qink@whu.edu.cn
收稿日期: 2018-12-01
要求修回日期: 2018-12-30
网络出版日期: 2019-01-20
基金资助
国家重点研发计划项目(2017YFB0503604)
国家自然科学基金项目(41471326、41525004)
Networked Mining of GDELT and International Relations Analysis
Received date: 2018-12-01
Request revised date: 2018-12-30
Online published: 2019-01-20
Supported by
National Key Research and Development Program, No.2017YFB0503604
National Natural Science Foundation of China, No.41471326, 41525004
Copyright
21世纪以来的国际关系错综复杂、瞬息万变,给世界的经济、安全、外交等带来了深刻变化。这些变化对中国的内外政策产生了重大影响。全面及时地分析国际关系及其变化特征,对于中国的经济和外交发展规划具有重要参考价值。国际关系研究具有复杂性、及时性、时空性等特点,迫切需要时空大数据分析技术为其提供新的思路和技术手段。大众媒体如报纸、广播等记录着世界上发生的各种各样的事件,蕴含着丰富的信息,相对于记录个人活动的社交媒体数据,其更加适合于对人类社会进行大规模和长时间的分析。GDELT是一个免费开放的新闻数据库,它实时监测世界上印刷、广播、网络媒体中的新闻,对其进行文本分析并提取出人物、地点、组织和事件等关键信息。本文利用复杂网络的理论和方法对GDELT进行网络化挖掘并进一步分析国家关系。首先利用该数据构建国家交互网络,然后通过网络特征统计分析国家之间的交互关系,最后探测国家冲突事件交互网络的时序变化。研究发现:① 国家交互网络具有无标度特性,网络连接在整体和局部上都呈现出不均匀性,少数国家与其他国家有大量交互,大多数国家与其他国家的交互很少;一个国家与少数国家有大量交互,而与大多数国家的交互很少。② 国家冲突事件交互网络的突然变化往往对应一些重大事件。本文的研究可以为大数据时代的国际关系探索提供一个新的视角,同时也为新闻媒体数据的分析提供参考。
秦昆 , 罗萍 , 姚博睿 . GDELT数据网络化挖掘与国际关系分析[J]. 地球信息科学学报, 2019 , 21(1) : 14 -24 . DOI: 10.12082/dqxxkx.2019.180674
The international relations are intricate and ever-changing since the 21st century, and have brought profound changes to the world's economy, security, and diplomacy. These changes have had a major impact on China's internal and external policies. A comprehensive and timely analysis of international relations and its changing characteristics has important reference value for China's economic and diplomatic development planning. The analysis of international relations has spatio-temporal characteristics, and it needs real-time processing. Thus, it needs to introduce the methods of spatio-temporal big data analysis to analyze international relations. Traditional mass media such as news, radio, etc. record all kinds of events happening in the world. It contains a wealth of information. Compared with social media data recording personal activities, it is more suitable for large-scale and long-term analysis of human society. The Global Database of Events Language, and Tone (GDELT) is a free and open news database which monitors news from print, broadcast, and online media in the world, analyzes the texts and extracts the key information such as people, place, organization, and event. This paper researches the network characteristics of GDELT based on theory of complex network and further analyze the relations between countries. Firstly, this paper constructs national interaction networks using GDELT, then analyze the interaction relationship between countries through network characteristic statistics, and finally detect the time series changes of the national conflict event interaction network. The results show that: (1)The National interaction network has scale-free characteristics, the interaction between countries is unevenly distributed from a global and local perspective. Very few countries have lots of interactions while most countries have very few interactions, and one country has lots of interactions with a few countries while a few interactions with most countries. (2) Sudden changes in the national interaction network of conflict events often indicates some significant national conflict events. This paper can provide a new perspective for the exploration of international relations and a reference for the analysis of news media in the era of big data.
Tab. 1 Example of data in event database表1 事件库数据示例 |
Actor1Geo_CountryCode | Actor2Geo_CountryCode |
---|---|
CH | US |
SY | RS |
US | RS |
Tab. 2 Example of data in global knowledge graph表2 全球知识图数据示例 |
数目 | 位置 |
---|---|
2 | 1#China#CH#CH#35#105#CH; 1#UnitedStates#US#US#39.828175#-98.5795#US; ;1#Canada#CA#CA#60#-96#CA |
4 | 1#United States#US#US#39.828175#-98.5795#US; 1#Russia#RS#RS#60#100#RS |
5 | 1#United States#US#US#34.04#-118.15#US; 1#Canada#CA#CA#60#-96#CA |
Fig. 1 Network constructed by example data图1 国家交互网络示例 |
Tab. 3 Topological characteristics of national interaction network表3 国家交互网络拓扑特征统计结果 |
数据源 | 时间 | N | M | k | D | C | L | A |
---|---|---|---|---|---|---|---|---|
事件库 | 1天 | 213 | 2420 | 22.723 | 0.107 | 0.514 | 2.080 | -0.200 |
1周 | 233 | 5385 | 46.223 | 0.199 | 0.613 | 1.877 | -0.201 | |
1月 | 246 | 8861 | 72.041 | 0.294 | 0.678 | 1.732 | -0.194 | |
全球知识图 | 1天 | 242 | 17 032 | 140.760 | 0.584 | 0.813 | 1.424 | -0.154 |
1周 | 254 | 25 023 | 197.031 | 0.779 | 0.923 | 1.222 | -0.111 | |
1月 | 257 | 27 960 | 217.588 | 0.850 | 0.941 | 1.150 | -0.059 |
注:N代表网络节点数;M代表网络连边数;k代表网络平均度;D代表网络图密度;C代表网络平均聚类系数;L代表网络平均路径长度; A代表节点度的同配系数。 |
Fig. 2 CCDF of node strength in national interactive network under double logarithmic coordinates图2 双对数坐标下国家交互网络的节点强度累计概率分布 注:S代表节点强度大小;C(S)代表节点强度至少为S的概率;(c)、(d)为对每个节点强度值都加一个平移参数。 |
Fig. 3 Top 20 countries of node strength in national interaction network in December, 2017图3 2017年12月国家交互网络节点强度排名前20国家分布 注:横坐标为节点强度排名前20的国家;纵坐标为该国家的节点强度占网络节点强度的比重。 |
Fig. 4 CCDF of edge strength of countries connected to China under double logarithmic coordinates图4 双对数坐标下与中国相连接国家的边强度累计概率分布 |
Fig. 5 Top 20 countries of interaction strength with China in December, 2017图5 2017年12月与中国交互强度排名前20国家分布 注:横坐标与中国交互强度排名前20个国家;纵坐标为该国家与中国的交互强度占中国与其它国家的总交互强度的比重。 |
Fig. 6 The growth distribution of material conflict number in 2017图6 2017年各月实质冲突事件数变化 注:纵坐标为后一个月相对前一个月的冲突事件变化量;红色柱面表示冲突事件数相对于上个月有所增长,蓝色柱面表示冲突事件数相对上月下降。 |
Fig. 7 The growth distribution of node/edge strength in December 2017 versus November 2017图7 2017年12月相对于11月的节点强度和边强度变化量分布 |
Fig. 8 The growth distribution of node/edge strength with the largest increase or decrease per month of 2017图8 2017年各月节点及边强度增长量或减少量最大的节点或边分布 注:纵坐标为后一个月相对前一个月的节点强度或边强度变化量;红色点表示相对于上个月节点强度或边强度增长量最大的节点或边;蓝色点表示节点强度或边强度相对上月减少量最大的节点或边。 |
The authors have declared that no competing interests exist.
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
The GDELT Project [EB/OL] [2018-12-27].
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
[
|
[11] |
|
[12] |
|
[13] |
[
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
[
|
[26] |
|
[27] |
[
|
[28] |
|
[29] |
|
[30] |
All GDELT Event Files[EB/OL][2018-12-27]. .
|
[31] |
All gdelt GKG Files[EB/OL][2018-12-27]. .
|
[32] |
|
[33] |
|
[34] |
|
[35] |
|
[36] |
|
[37] |
[
|
/
〈 |
|
〉 |