Journal of Geo-information Science ›› 2019, Vol. 21 ›› Issue (8): 1152-1160.doi: 10.12082/dqxxkx.2019.190046

Previous Articles     Next Articles

Application and Comparison of Topic Model in Identifying Latent Topics from Disaster-Related Tweets

SU Kai1,CHENG Changxiu1,*(),Nikita Murzintcev2,ZHANG Ting1   

  1. 1. Center for Geodata and Analysis, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
    2. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
  • Received:2019-01-25 Revised:2019-04-24 Online:2019-08-25 Published:2019-08-25
  • Contact: CHENG Changxiu E-mail:chengcx@bnu.edu.cn
  • Supported by:
    National Key Research and Development Program of China(2017YFB0504102);Supported by the Fundamental Research Funds for the Central Universities

Abstract:

From 1990 to 2010, the occurrence of natural disasters was increasing in countries along the "One Belt and One Road" where most countries are developing countries with underdeveloped economy and weak disaster resistance. When disasters happen, people in those countries will tweet about the disasters in real time. The tweets contain important information for emergency rescue, disaster assessment, disaster reduction and prevention, etc. Therefore, mining and analyzing relevant tweets can provide powerful support for China's international rescue and relief work. However, twitter data is fragmented and unstructured, and the number of topics that tweets contain are huge and miscellaneous. Therefore, how to rapidly screen out relevant information from tweets becomes a research challenge. Without empirical corpus, topic model can rapidly aggregate information from a large number of disaster-related tweets, which are valuable for disaster relief and assessment. In this paper, the BTM model and LDA model, that are widely used in the study of natural language processing, were adopted to cluster Haiyan typhoon-related tweets at fine granularity topics. Then we verified and compared the accuracy of two models, and tested their ability to distinguish similar disaster topics. In addition, based on the "demand-related" tweets obtained from topic categorization, through place-name matching, we analyzed the spatial distribution of demand degree of materials and medical care in the Philippines during the occurrence of Haiyan typhoon. The result shows that: (1) In classifying Haiyan typhoon-related tweets at fine granularity topics, the overall accuracy of BTM was 0.598, while that of LDA was only 0.321, indicating that BTM can outperform LDA. (2) The F1-measure values of BTM in "disaster location-related" and "blessing-related" tweets were 0.8 and 0.78, indicating that BTM can better identify tweets of those two topics. (3) After preliminary verification, the spatial distribution of material and medical needs generated based on "demand-related" tweets was basically consistent with the actual demand. Our findings can help quickly obtain first-hand disaster information from twitter when China lacks relevant data of disasters occurring in the "One Belt and One Road" region, so to provide data support for China's international rescue work. Besides, our methodology can be used for studying domestic microblog in disasters.

Key words: Topic model, BTM, LDA, Tweet, Topic categorization, Natural hazard, Emergency management