The Uncertainty of Polygon-based Statistical Data Spatial Analysis: Case of Census Data of Haidian District, Beijing

  • 1. Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Planning and Design Institute of Forest Products Industry, State Forestry Administration, Beijing 100714, China

Received date: 2012-07-10

  Revised date: 2012-12-24

  Online published: 2013-06-17


In statistic geographic information system, census data, stored as polygon attribute, is a kind of polygon- based statistical data. Moreover, in the studies of geography and social science, polygon-based statistical data is a main data source for uncovering spatial patterns of social phenomena by spatial analysis. However, due to the limitation of data and restriction of computer processing power, uncertainty of polygon-based statistical data spatial analysis is always ignored, and there is no well methodology for analyzing such uncertainty. To address this question, we developed a method concerning modifiable areal unit problem (MAUP) to evaluate uncertainty of polygon-based statistical data spatial analysis. The population data collected from each buiding in Beijing makes the mehtod applicable. For MAUP, we considered it as scale and aggregation separately. For polygon- based statistical data, we applied census data of Haidian District (Beijing) with polygons of buildings as its georeference. With this method, we introduced scale and shape indices and applied visual analysis and data fitting to detect the uncertainty of five analysis methods: Sum, Mean, Standard deviation, Global Moran's I and Anselin Local Moran's I (LISA). In addition, the relationships between scale, shape indices and the five analysis methods are also revealed in order to demonstrate the way that MAUP affects polygon-based statistical data spatial analysis. The result of the research shows as follows: (1) the results derived from census data spatial analysis with normal census tracts as zone system are arbitrary and have great uncertainty. (2) The results derived from census data spatial analysis with regular nets as zone system well describe the spatial patterns of original data, but still depend on the scale and zoning of the net system. (3) The results derived from census data spatial analysis with regular grid as zone system, are functionally related to the scale of the grid system, and the uncertainty of the results represents multi-scale spatial patterns of original data. And (4) aggregation together with scale affects census data spatial analysis. With regard to regular net system with fixed scale, the number of the neighbors of each polygon affects the results of the analysis. According to the above, it is better to re-aggregate the census data by regular grid system with proper scale and apply multi-scale methods in polygon-based statistical data analysis.

Cite this article

ZHANG Xiao-Hu, ZHONG Er-Shun, WANG Shao-Hua, ZHANG Xun, ZHANG Ji . The Uncertainty of Polygon-based Statistical Data Spatial Analysis: Case of Census Data of Haidian District, Beijing[J]. Journal of Geo-information Science, 2013 , 15(3) : 369 -379 . DOI: 10.3724/SP.J.1047.2013.00369


[1] 廖顺宝,李泽辉.基于人口分布与土地利用关系的人口数据空间化研究——以西藏自治区为例[J].自然资源学报,2003,18(6):659-665.

[2] 陈浩,邓祥征.中国区域经济发展的地区差异GIS 分析[J].地球信息科学学报,2011,13(5):586-593.

[3] 董冠鹏,郭腾云,马静.京津冀都市区经济增长空间分异的GIS分析[J].地球信息科学学报,2010,12(6):797-805.

[4] 安凯,陈炎平,张锦水,等.甘肃省统计地理信息系统建设研究[C].中国地理信息系统协会第三次代表大会暨第七届年会,2003.

[5] 杜培军,张海荣,冷海龙.地理空间分析——原理、技术与软件工具[M].北京:电子工业出版社,2009.

[6] Cressie N. Statistics for spatial data[J]. Terra Nova, 1992,4(5):613-617

[7] Openshaw S. The modifiable areal unit problem[M].Conceptsand Techniques in Modern Geography. Norwich:Geo Book,1984.

[8] Goodchild M F. Issues of quality and uncertainty[M]. NewYork: Elsevier,1991.

[9] 邬伦,于海龙,高振纪,等.GIS 不确定性框架体系与数据不确定性研究方法[J].地理学与国土研究,2002,18(4):1-5.

[10] 史文中.空间数据与空间分析不确定性原理[M].北京:科学出版社,2005.

[11] 李海萍.空间统计分析中的MAUP及其影响[J].统计与决策,2009(22):15-17.

[12] Gehlke C,Biehl K. Certain effects of grouping upon thesize of the correlation coefficient in census tract material[J]. J Am Stat Assoc, 1934,29(185):169-170.

[13] 邬建国,JELINSKI D. 生态学中的格局与尺度-可塑性面积单元问题[M].北京:科学出版社,1995.

[14] Openshaw S, Taylor P. A million or so correlation coefficientsstatistical methods in the spatial sciences[M]. London:Pion,1979,127-144.

[15] Fotheringham A S, Densham P J, Curtis A. The zone definitionproblem in location ‐ Allocation modeling[J].Geogr Anal,1995,27(1):60-77.

[16] Fotheringham A S, Brunsdon C, Charlton M. Quantitativegeography: Perspectives on spatial data analysis[M]. London:Sage Publications Ltd,2000,237-240.

[17] Openshaw S,Rao L. Algorithms for reengineering 1991census geography[J]. Environ Plann A, 1995,27(3): 425-446.

[18] 海淀区第六次全国人口普查领导小组办公室. 海淀区2010 年第六次全国人口普查主要数据公报(1)[OL].海淀区统计局. 2011.

[19] Goodchild M, Anselin L, Deichmann U. A framework forthe areal interpolation of socioeconomic data[J]. EnvironPlann A,1993,25(3):383-397.

[20] Moran P A P. Notes on continuous stochastic phenomena[J]. Biometrika, 1950,37(1/2):17-23.

[21] Anselin L. Local indicators of spatial association—LISA[J]. Geogr Anal, 1995,27(2): 93-115.

[22] 杜国明,张树文,张有全.城市人口分布的空间自相关分析——以沈阳市为例[J].地理研究,2007,26(2):383-390.

[23] Zhang X, Zhong E, Zheng H, et al. Reconstructing continuouspopulation density surface from polygon-based data[C]. IEEE, 2010.