Journal of Geo-information Science >
A Method of Typhoon Disaster Loss Identification and Classification Using Micro-blog Information
Received date: 2018-01-18
Request revised date: 2018-03-20
Online published: 2018-07-13
Supported by
National Key Research and Development Program of China, No.2016YFE0122600; National Natural Science Foundation of China, No.41771476
Copyright
Social media plays a more and more important role in the real-time disaster information distribution and dissemination. During the disaster event, social media usually generates and contains a lot of real-time disaster loss information, which is very useful for the timely disaster response and disaster loss assessment. However, the social media data has many shortcomings, such as high fragmentation of the information, sparsity of the text features, and the lack of annotated corpus and so on, which makes the traditional supervised learning method difficult to be effectively used for disaster information extraction. This paper proposed a fast disaster loss identification and classification method to extract the disaster information from social media data by extending the context features and matching feature words. By this method, we firstly extracted the keywords from a small amount of sample micro-blog text of different disaster loss categories based on Chinese grammar rules and constructed the pairs of feature words collocation. Then, we used the word vector model and the existing lexicon to supplement and expand these pairs of feature words collocation. And the external corpus was introduced to optimize the semantic collocation relationship between feature words according to the rules of the concurrence of Chinese words. At last, we built a classification knowledgebase for identification and classification of disaster loss information related to typhoon disasters included in micro-blog. An experiment system was developed to evaluate the method introduced in the paper. Typhoon "Meranti" landed on 15th September, 2016 was selected as a case study. Results show that this method has a significant effect (each comprehensive evaluation index of different categories is greater than 0.74) on identifying and classifying different categories of disaster loss information from social media. We mapped the spatio-temporal distribution of typhoon influence based on the classification results of disaster loss from social media. The experiment shows that the classification output data and maps could be used for the disaster loss evaluation and mitigation.
YANG Tengfei , XIE Jibo , LI Zhenyu , LI Guoqing . A Method of Typhoon Disaster Loss Identification and Classification Using Micro-blog Information[J]. Journal of Geo-information Science, 2018 , 20(7) : 906 -917 . DOI: 10.12082/dqxxkx.2018.180062
Fig. 1 Algorithm flow图1 算法流程 |
Tab. 1 Pattern of lexical rule表1 词法规则模式 |
模式规则 | 文本样例 |
---|---|
v-n | 到处都是被打碎的玻璃 |
n-v | 整个树被吹倒在地了 |
a-n | 一地的碎窗玻璃 |
n-a | 道路一直不畅通 |
d-vi | 很快小区就不再供水了 |
v-vi | 即将停止供电 |
r-v | 看见他被树给砸了 |
v-r | 树枝被风吹断刚好砸到他 |
vi | 今天停电一天 |
注:v为动词;n为名词;a为形容词;d为副词;r为代词;vi为不及物动词 |
Fig. 2 The structure of Skip-gram model图2 Skip-gram模型结构 |
Tab. 2 An example of the computational results of the word vector model表2 词向量模型计算结果示例 |
树 | 倒 |
---|---|
大树 | 吹 |
整棵 | 大树 |
折断 | 断 |
应声 | 压垮 |
断 | 树干 |
倒 | 一棵 |
根 | 电线杆 |
一棵 | 棵 |
树枝 | 42棵 |
枝干 | 砸 |
Fig. 3 Process of low frequency word processing图3 低频词处理流程 |
Fig. 4 Optimization process of collocation relationship图4 词语搭配关系优化流程 |
Tab. 3 An example of the structure of classified knowledge base表3 分类知识库结构示例 |
编码位 | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
符号举例 | B | a | 01 | w1/w2 |
符号性质 | 大类 | 小类 | 词群 | 原子词群 |
级别 | 第1级 | 第2级 | 第3级 | 第4级 |
Tab. 4 Distribution of different categories of corpus表4 各类别语料分布 |
类别编号 | 灾损类别 | 数量/条 |
---|---|---|
1 | 人员伤亡 | 34 |
2 | 供水影响 | 337 |
3 | 建筑物损伤 | 154 |
4 | 商业影响 | 63 |
5 | 林业影响 | 181 |
6 | 交通受阻 | 138 |
7 | 交通工具损坏 | 107 |
8 | 供电影响 | 402 |
9 | 电力设施受损 | 138 |
10 | 通讯影响 | 163 |
11 | 基础设施损坏 | 104 |
Fig. 5 Classification of disaster loss图5 灾损信息类别划分 |
Tab. 5 Comparison of experimental results表5 实验结果对比 |
类别 | 评测结果 | ||
---|---|---|---|
P/% | R/% | F-1值/% | |
第1类人员伤亡 | 68.00 | 89.47 | 77.27 |
第2类供水影响 | 87.32 | 95.48 | 91.22 |
第3类建筑物损伤 | 76.10 | 85.14 | 80.37 |
第4类商业影响 | 100.00 | 75.00 | 85.71 |
第5类林业影响 | 79.00 | 84.61 | 81.71 |
第6类交通受阻 | 78.74 | 87.71 | 82.98 |
第7类交通工具损坏 | 74.19 | 88.46 | 80.70 |
第8类供电影响 | 90.29 | 93.93 | 92.07 |
第9类电力设施损坏 | 78.54 | 70.53 | 74.32 |
第10类通讯影响 | 86.95 | 71.42 | 78.43 |
第11类基础设施受损 | 76.47 | 72.22 | 74.28 |
Fig. 6 Recall rates of various categories图6 各类别召回率 |
Fig. 7 Precision rate of various categories图7 各类别准确率 |
Fig. 8 The variations of the quantity of “Sina-Weibo” with time图8 微博量随时间变化关系图 |
Fig. 9 Real-time path of typhoon "Meranti"图9 台风“莫兰蒂”实时路径图 |
Fig. 10 Geospaital distribution of the "traffic obstruction" information图10 “交通受阻”信息空间分布 |
Fig. 11 Geospatial distribution of disaster loss in each time period图11 各时间段灾损信息空间分布 |
Fig. 12 Overall geospatial distribution of disaster loss information图12 灾损信息整体空间分布 |
The authors have declared that no competing interests exist.
[1] |
|
[2] |
|
[3] |
[
|
[4] |
[
|
[5] |
[
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
[
|
[11] |
[
|
[12] |
|
[13] |
|
[14] |
[
|
[15] |
[
|
[16] |
[
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
[
|
[25] |
[
|
[26] |
|
[27] |
|
[28] |
|
[29] |
[
|
[30] |
[
|
[31] |
[
|
[32] |
[
|
[33] |
[
|
[34] |
[
|
/
〈 |
|
〉 |