社会经济数据空间化现状与发展趋势
作者简介:吴吉东(1981-),男,河南西峡人,博士,副教授,研究方向为自然灾害综合评价。E-mail: wujidong@bnu.edu.cn
收稿日期: 2018-03-15
要求修回日期: 2018-06-27
网络出版日期: 2018-09-25
基金资助
国家自然科学基金项目(41571492)
国家重点研发计划课题“全球变化人口与经济系统风险评估模型与模式研究”(2016YFA0602403)
The Status and Development Trend of Disaggregation of Socio-economic Data
Received date: 2018-03-15
Request revised date: 2018-06-27
Online published: 2018-09-25
Supported by
National Natural Science Foundation of China, No.41571492
National Key Research and Development Program, No.2016YFA0602403.
Copyright
行政单元的社会经济统计数据与地理单元的要素数据之间存在空间不匹配的问题,很难满足自然与人文交叉学科研究的需要。本文首先对社会经济数据空间化指标和方法进行了总结,发现空间化研究主要集中在人口与国内生产总值指标,对资本存量、房屋等其他社会经济指标的空间化研究相对较少;根据空间化的思路和方法差异,可将空间化方法归纳为面积权重法、统计模型法和多源数据融合法三类。最后通过对比分析不同空间化方法的原理和优缺点可知:社会经济研究指标多样化、空间化精度要求的多元化和大数据应用的广泛化是社会经济数据空间化的发展趋势。同时,大数据等新的辅助数据源的出现为空间化精度的提高带来了契机,在社会管理精细化要求不断提高的背景下,社会经济数据空间化也越来越成为研究热点。
吴吉东 , 王旭 , 王菜林 , 何鑫 , 叶梦琪 . 社会经济数据空间化现状与发展趋势[J]. 地球信息科学学报, 2018 , 20(9) : 1252 -1262 . DOI: 10.12082/dqxxkx.2018.180137
There is a spatial unit mismatch between statistical socio-economic data that based on administrative division statistics and geographic elements expressed in spatial grid units. It requires spatial processing technique to solve this mismatch. Research on the disaggregation of socio-economic data currently focuses on the indicators of population and gross domestic product. There is relatively few disaggregation of other socio-economic indicators, such as capital stock and housing which are essential input data for risk analysis. Dozens of spatial disaggregation models exist for different research objects. According to the differences in disaggregation ideas and methods of disaggregation, disaggregation models can be classified into three categories: area weighting method, statistical model method, and multi-source data fusion method. Area weighting method is simple but was criticized by its low resolution when applied on small scale studies. Statistical model method is widely used in disaggregation of large-scale socioeconomic statistical data, but needs sufficient spatial data for spatial statistics. While these methods can produce acceptable results, their actual resolution cannot be considered ideal. With the updating and appearing of new data sources for the disaggregation of socio-economic data, multi-source data fusion method has become the main disaggregation method recently. Moreover, the data and methods needed for disaggregation are continuously improved. Comparing principles, advantages and disadvantages of different disaggregation methods, we can see that diversification of socio-economic research indicators and spatial precision requirements, and wide application of big data are the development trend of disaggregation of socio-economic data. Meanwhile, the appearance of new data source is an important opportunity for improving spatial accuracy of the disaggregation. Overall, disaggregation of the socioeconomic data will be a hot subject for future study, one of the reasons for which is the increase of the research needs for high resolution grid data. Another reason is that research institutes and publishers have paid more attention to the scientific data which is reflected by new emerging scientific data journals.
Key words: socio-economic data; population; disaggregation; research progress; big data; raster dataset
Tab. 1 Comparison of major disaggregation models表1 主要空间化模型的综合比较 |
研究方法 | 相关模型 | 基本原理 | 优势 | 不足 | 主要数据 |
---|---|---|---|---|---|
面积权重法 | 无辅助数据的权重插值模型[36] | 假设研究对象在空间上均匀分布,根据次一级的格网占整体的面积比例计算其值 | 所需数据源少,便捷 | 与实际差异较大 | 行政区划 |
有辅助数据的权重插值模型[37] | 通过辅助数据限定研究对象在空间分布范围,并根据面积比重计算空间分布值,以期符合实际 | 缩小了社会经济数据的空间分布范围误差,更贴近实际 | 区域均匀分布的结果不能表达空间差异 | 行政区划、土地利用、居民点信息等 | |
统计模型法 | 数学函数 模型[38,39,40,41,42,43,44,45,46,47] | 通过已有的数学函数引入参数建立模型,模拟社会经济数据的空间分布 | 引入数学函数表达空间差异,科学性强,理论基础完备 | 数学函数较复杂,参数设置针对性强,不易理解 | 中心密度、带宽 |
分区密度制图 模型[48,49,50] | 不同地区密度存在差异,将研究区根据特征范围划分等级,不同等级建立不同的密度模型 | 分区概念符合实际,模型简单易懂 | 区域划分难度会随分区细化程度逐渐增加 | 人口密度、土地利用、居民点分布、地形、道路等 | |
多源数据融合法 | 因子权重 模型[30,51-57] | 确定不同影响因子对要素空间分布的权重系数,最后叠加得到空间分布结果 | 要素空间分布更加贴近实际 | 确定不同因子的权重过程复杂,可控性差 | 土地利用、居民点分布、夜间灯光、手机信令、道路等 |
多元回归模型[58] | 不同影响因子之间也存在相互影响,通过多种影响因子建立回归模型模拟社会经济数据空间分布 | 多源数据融合弥补了单一要素表达空间分布密度的不足 | 不同数据的融合过程容易产生误差,计算过程复杂 | 土地利用、夜间灯光、手机信令等 |
Fig. 1 Diagram of the areal weighting interpolation method图1 面积权重法图解[35] |
Fig. 2 Diagram of spatial population density disaggregation base on the random forest method图2 基于随机森林法的人口密度示意图[47] |
Fig. 3 Framework of multi-layer and multi-class dasymetric model图3 分区密度制图模型图解[49] |
Fig. 4 Building grids with resolutions of 500 m×500 m and 125 m×125 m and their relationship with flood risk assessment uncertainty图4 不同分辨率建筑格网表达及与洪水风险评估不确定性的关系[69] |
Fig. 5 Identified commuting pattern for extreme commuting trips and main commuting directions using the bus smartcard data图5 基于公交卡记录的北京市中心区极端出行和通勤链接形态[78] |
The authors have declared that no competing interests exist.
[1] |
[
|
[2] |
[
|
[3] |
|
[4] |
[
|
[5] |
[
|
[6] |
[
|
[7] |
|
[8] |
|
[9] |
|
[10] |
[
|
[11] |
[
|
[12] |
[
|
[13] |
[
|
[14] |
[
|
[15] |
[
|
[16] |
[
|
[17] |
[
|
[18] |
[
|
[19] |
|
[20] |
|
[21] |
|
[22] |
[
|
[23] |
[
|
[24] |
|
[25] |
|
[26] |
|
[27] |
[
|
[28] |
[
|
[29] |
[
|
[30] |
[
|
[31] |
|
[32] |
|
[33] |
|
[34] |
[
|
[35] |
|
[36] |
[
|
[37] |
[
|
[38] |
[
|
[39] |
[
|
[40] |
[
|
[41] |
[
|
[42] |
|
[43] |
[
|
[44] |
|
[45] |
|
[46] |
|
[47] |
[
|
[48] |
[
|
[49] |
|
[50] |
[
|
[51] |
|
[52] |
|
[53] |
[
|
[54] |
[
|
[55] |
[
|
[56] |
[
|
[57] |
[
|
[58] |
|
[59] |
[
|
[60] |
[
|
[61] |
|
[62] |
[
|
[63] |
|
[64] |
[
|
[65] |
[
|
[66] |
|
[67] |
|
[68] |
|
[69] |
|
[70] |
[
|
[71] |
|
[72] |
[
|
[73] |
|
[74] |
[
|
[75] |
|
[76] |
[
|
[77] |
[
|
[78] |
[
|
[79] |
[
|
[80] |
|
[81] |
|
[82] |
|
/
〈 | 〉 |