ARTICLES
ZHANG Chunju, ZHANG Xueying, ZHU Shaonan, XU Xitao
Generally, toponym database provides description information on place names and its spatial location and feature type. It provides basic information for national administration, economic development, domestic and foreign exchanges, etc. It is a basis for public place name services, particularly for Location-Based-Service (LBS) with a growing demand. Therefore, a toponym database with complete and timely place name information is a premise and guarantee for efficient LBS services. However, currently, there are some problems about place names in our national toponym database. Most of the place names are with a big particle size, and small particle sized and non-standard place names are in shortage, and there are no relative position descriptions of place names in toponym database. Moreover, toponym database updating is based on manual surveying with disadvantages of long cycle, high cost, low efficiency and time consuming. In this paper, a new method for toponym database updating is explored on the technology combination of search engine, web crawler and place name recognition. Firstly, a mass of space-sensitive web pages are obtained by a web crawler which is based on Google search engine and a spatial search subject of "place name" or "place name + spatial relation terms". Secondly, after analysis of web pages with a DOM tree method, place name recognition is completed based on Conditional Random Fields (CRF) recognition model. Finally, automatic spatial location interpretation of place names is completed from candidate web texts which include new place names and spatial location information of place names. This paper also presents a case study with a spatial search subject of "Nanjing Normal University, Xianlin hotel + northwest". The experiment result shows that this method is feasible and effective. However, timely and accurately locating of place names in web pages are in challenge, because publishing time of web pages and change time of place names driven by events in web pages are not considered in this paper. This may result in potential lag of place name information and can't ensure the completeness and consistency of toponym database. In recent years, public participation internet maps can provide accurate and real-time place name source, especially coordinate information, such as GoogleMap, GoogleEarth, OpenStreetMap, etc. Our future work will focus on time attribute interpretation of place names from web pages and obtaining of place names as well as their coordinates from internet maps. Moreover, an integration of place names from different data sources will provide a more effective toponym database updating.