The Credibility and Evaluation of Volunteered Geographic Information

MA Chao; SUN Qun; XU Qing; WANG Zhijian

doi:10.3724/SP.J.1047.2016.01305

Journal of Geo-information Science >

2016 , Vol. 18 >Issue 10: 1305 - 1311

DOI: https://doi.org/10.3724/SP.J.1047.2016.01305

Orginal Article

The Credibility and Evaluation of Volunteered Geographic Information

MA Chao ,
SUN Qun ^,^* ,
XU Qing ,
WANG Zhijian

Expand

Institute of Geospatial Information, Information Engineering University, Zhengzhou 450052, China

*Corresponding author: SUN Qun, E-mail: sunqun@371.net

Received date: 2016-04-13

Request revised date: 2016-06-20

Online published: 2016-10-25

Copyright

《地球信息科学学报》编辑部所有

Fold

Abstract

The volunteered geographic information is mostly derived from volunteers' uploading with no quality guarantee. This has become a major obstacle of the application for the volunteered geographic information. Also, the quality becomes the primary problem which requires to be solved firstly. The quality evaluation is the focus of the present research domain. There are many researches on the quality evaluation of volunteered geographic information but a few research on the quality evaluation without reference data. Since it is very difficult or costly to get the reference data, it is important to study the quality evaluation without the reference data. In order to solve this problem with an unknown quality of the volunteered geographic information and the difficulty of getting the reference data with high quality, we proposed the credibility model. The credibility model could evaluate the VGI quality from the number of the volunteers and their reputation for the data changing trend. Then, we turn the qualitative analysis result into the quantitative expression. On one hand, the volunteer's reputation model was built based on this point, meaning a statistic method which takes the proportion for the preserving points as the volunteer's reputation value. Then, the Linus Law adapts and measures the geographical information collected by volunteers, of which credibility relied on the sum of the volunteer's reputation within the research areas. On the other hand, the information quality is gained from the data-changing tendency and measured by the geographical information credibility through computing the degree of change within the research areas. At last, for verification and analyzing the rationality of the credibility model, the OpenStreetMap, collects previous data adapted for Beijing, Shanghai and other cities which require to be used in experiments. Finally, the navigation data is selected as the reference data for the comparison. The result of calculation for the credibility model has a great coherence for the result based on the reference data.

Key words： volunteered geographic information; credibility; quality evaluation; reputation model

Cite this article

MA Chao , SUN Qun , XU Qing , WANG Zhijian . The Credibility and Evaluation of Volunteered Geographic Information[J]. Journal of Geo-information Science, 2016 , 18(10) : 1305 -1311 . DOI: 10.3724/SP.J.1047.2016.01305

1 引言

自发地理信息（Volunteered Geographic Information,VGI）迅猛发展,正逐渐成为地理信息数据获取的重要手段^[1]。VGI数据具有覆盖全球、免费获取、更新频繁等特点,可应用于应急制图、出行导航、疾病传播和舆情监测等诸多领域^[2-3]。由于VGI数据大多来源于志愿者上传或由其他数据导入,没有严格的质量控制手段,往往存在恶意的、重复的、精度低的数据。因此,与专业部门生产的数据相比,VGI数据质量具有不明确性。数据质量问题成为制约VGI数据广泛应用的瓶颈^[4]。VGI数据质量评价研究是目前相关领域的重点内容之一,可以按照是否使用参考数据分为基于参考数据的定量方法和基于数据分析的定性方法2种。

基于参考数据的评价方法通过选取不同的质量元素为依据,将VGI数据与参考数据进行匹配、对比,从而定量获取VGI质量信息,具有直观、定量、可靠等特点。例如,Haklay以英国测绘局数据为参考数据,分析了伦敦地区VGI道路数据的定位精度和完整性^[5];Ciepluch等比较了英格兰地区OSM数据、Google地图和Bing地图3种数据的完整性、现势性和几何精度^[6];Al-Bakri、Maco等采用实地测量的方法评估VGI数据质量^[7-8];Roberto提出了运用卫星影像来测定VGI数据几何精度,对比了VGI数据、USGS数据和TIGER数据^[9];王明等分析了武汉市VGI数据的质量情况,选取了数据完整性、属性信息准确性、位置精度3种质量元素^[10]。综上,定量分析的研究成果中,学者们普遍认为VGI数据质量存在分布不均匀的问题：经济发达地区,几何精度和数据完整性可以媲美专业数据,但偏远郊区数据质量较差^[11-15]。由于参考数据往往很难获取或成本较高,因此该方法难以得到广泛应用。

基于数据分析的评价方法是一种间接分析的方法,在没有参考数据的情况下,其通过对数据固有信息进行深入分析、挖掘,获得能够反映VGI数据质量的相关信息,但不能得到定量的结果。Haklay、Arsanjani等研究了VGI数据质量与用户的数量的关系,结果表明志愿者人数较多的区域定位精度往往更高^[16-17];Christopher等提出了基于VGI数据固有特征的质量评价研究框架,针对不同的应用场景提出了不同的评价质量元素和方法^[18]; Bishr、Kebler研究了基于志愿者信誉度的质量评价方法^[19-20];Mooney研究了VGI数据中被多次编辑（超过15次）的对象的情况^[21];赵肄江等提出了基于版本相似度的志愿者信誉度模型评价方法^[22]。上述定性分析方法对VGI数据质量分析具有重要的参考价值,可以摆脱参考数据的限制,但该方面的研究尚处于探索阶段,所建立的模型、方法仍不够完善。

实际应用中,具有高质量的参考数据往往难以获取或获取成本较高,虽然学者们已经开展了VGI数据质量评价的相应研究,但是其评价结果往往是定性分析,不能给予数据使用者太多的帮助。为此,本文引入了数据可信度的概念,将研究数据质量转变为研究VGI数据的可信度,充分利用VGI历史数据库,并将定性分析结果进行量化表达。首先,简要论述了VGI数据可信度的相关概念、内容及其度量方法;然后给出了各个度量指标的具体计算方法。实验结果表明,VGI数据的可信度与其数据质量总体上保持正相关,能够为无参考数据条件下VGI数据质量评价提供参考。

2 VGI数据可信度的基本特征及影响因素

2.1 可信度与数据质量

可信度指对人或事物可以信赖的程度,是根据经验对事物为真的相信程度^[23]。VGI数据的可信度是指VGI数据能够被使用者认可或信赖的程度。在没有参考数据的情况下,很难通过VGI数据本身获取其数据质量的定量评价,但通过对VGI数据相关信息进行挖掘,可以获得与质量好坏相关的信息,称为VGI数据的可信度,反映数据是否值得信任的程度。

值得注意的是,可信度不等同于数据质量,可信度高的数据,其质量不一定高。可信度可以认为是高质量的概率,即可信度越高,其越可能具有较高的质量,越能够受到使用者的认可或信赖。

2.2 可信度的基本特征

VGI数据可信度具有以下基本特征^[4]：

（1）动态性。VGI数据更新频繁,因此VGI数据的可信度是一个动态的概念,可信度评价只是对当前版本数据可信度的评价,随着用户的不断上传与修改,其可信度也随之动态变化。

（2）空间性。VGI数据可信度是指一定区域内数据的可信度,而不是指单个目标的可信度。

（3）概率性。VGI数据的可信度可以看作是质量的概率,可信度越高,就越有可能成为高质量数据。

2.3 可信度的影响因素

VGI数据可信度与数据质量密切相关,因此与数据质量有关的因素都会影响VGI的可信度。基于已有VGI数据分析的方法,本文主要从VGI的用户数量与信誉度、数据量变化趋势2个方面考察VGI的可信度。

2.3.1 用户数量及其信誉度

软件工程领域中著名的李纳斯定理（Linus Law）指：足够多的眼睛,可以让所有问题浮现^[16]。其原本是指将代码公开后,随着关注人数的增加,代码中错误会无处藏身。本文将该定理应用于VGI数据中：数据上传以后,随着后续用户不断地关注、修改、完善,数据质量越来越好。因此,区域内参与上传、修改的志愿者数量越多,数据就会拥有更多的改善、优化机会,数据质量也会不断提高。已有的研究表明,用户数量与VGI数据质量存在一定的正相关关系^[16-17],因此可以依据区域内参与上传数据的志愿者数量来度量VGI数据的精度可信度。

此外,VGI的数据质量还与数据的直接上传者有关。通常情况下,用户比较信任来自于训练有素、经验丰富的专业人员上传或编辑修改的数据。因此,VGI数据质量除了与用户人数相关外,还与用户的专业程度相关。由于用户的专业程度信息获取比较困难,可以通过计算用户的信誉度来代替,用户的信誉度可看作是用户影响VGI数据质量的权重。因此,研究区域内所有用户的信誉度之和既包含用户数量因素,又包含用户专业程度因素,可以将其作为度量该区域可信度的一个影响因子。

2.3.2 数据变化趋势

以道路数据为例,考察VGI数据的发展过程：从主干道路开始积累,数据增长幅度较小,数据质量较低;随着主要道路的不断增多,逐渐出现了次要的道路、小路、街区道路等,数据增长进入高速阶段,同时各种道路的属性信息也不断地丰富和完善,数据质量逐渐提升;随着时间的推移,新上传的道路、数据的修改越来越少,直至不再有变化,数据进入饱和阶段,经过不断的修改完善,数据质量较高。

从上述过程可发现,在一定时间内,数据量变化程度越大,表明其处于发展阶段,数据更替频繁,其数据质量较低;反之,数据量的变化程度越小,说明其数据量趋于饱和,数据质量较高。因此,可以根据这种数据量的变化程度来反映VGI数据质量情况,利用这种数据变化趋势对可信度进行度量。

3 VGI数据可信度的计算

3.1 用户信誉度

信誉度原本是金融业术语,随着Web 2.0和大数据的发展,已经广泛应用于物流、服务、电子商务等诸多领域^[24]。随着网络走进千家万户,人们越来越多地利用网络进行交流与合作,如网络论坛、电子商务等。在无法保证双方行为的真实性与可信度的情况下,人们提出了信誉系统。该系统通过收集和分析用户的历史行为来预测其未来行为的真实性,从而为人们选择合适的交互对象提供参考。这种信誉系统的本质是对用户的行为进行评分,然后将这些评分按照一定的规则进行计算,从而得到该用户行为的信誉度,并将这些信誉度提供给网络上的其他用户进行参考^[25]。

信誉系统的实质是对用户历史行为进行评价,这种评价往往由其他用户主导,可以分为直接评价和间接评价2种^[26]。直接评价是后续用户对该用户的行为通过打分、投票等方式进行评价,例如,网络购物,客户根据商家的服务质量进行打分,商家的信誉由所有用户的打分构成。间接评价是指不直接对用户行为进行打分,而是通过后续过程中,对该用户的其他行为做出评价。例如,一些网络论坛,用户的信誉度主要取决于其他用户对该用户的回帖数量。

VGI系统中,目前还尚未实行用户信誉系统,也没有对用户进行评价的机制,因此只能采用间接评价的方法构建志愿者信誉模型。如上所述,后续用户对之前用户数据的修改,可以看作是数据的完善、优化,不妨也可以作为后续用户对之前用户数据的评价：修改程度大,说明对之前用户的数据评价较差;修改程度小,说明对之前用户的数据评价较高。由此,可以将志愿者数据上传后被编辑、修改的程度作为其他用户的评价,并依此构建志愿者信誉模型。

为计算志愿者上传数据被后续用户编辑修改的程度,文献[22]引入了版本相似度的概念,通过版本相似度来反映其他用户对该用户的评价,取得了很好的效果。但是通过分析VGI数据发现,VGI数据的基本单位是点,而线和面都是由一系列的点构成,通常情况下版本之间的修改变化非常小,很多时候仅仅是点的移动或增删。志愿者在修改、编辑数据时,也是以点为单位进行的,其行为主要包括：（1）新建点,即新建立一个点;（2）修改点,即修改已有的点数据,可以是属性信息的修改,也可能是位置的移动;（3）删除点,即删除一个已有点。

因此,可以考虑以点为单位对志愿者的数据进行评价。志愿者在某次编辑完成后,其所新建、修改或删除的点作为上传数据,后续的志愿者会对这些点进行各种编辑操作,如可以通过计算那些一直保留而未经过编辑修改的点的比重,作为志愿者的信誉度。假设用户A在某次编辑过程中,新建点40个,修改点50个,删除点10个。在随后的时间内,这些点有75个没有经过任何编辑修改,那么可以认为该用户的信誉度为0.75。基于点数统计的志愿者信誉度计算公式如式（1）所示。

R i = ∑ P s ∑ P a

（1）

式中：

R i

表示用户i的信誉度;

∑ P a

表示用户上传的所有点;

∑ P s

表示用户所有上传点中被保留的点。

3.2 数据变化趋势

计算VGI数据的变化趋势需要获取实验区域内数据发展情况。VGI数据库在固定间隔内会发布一次更新,可以通过比较每次更新前后的数据量来判断某区域数据的发展状态。设某区域数据量N具有m个版本,用（

N 1, N 2, …, N m

）表示,相邻版本

N i - 1

和

N i

之间的变化用

Δ N i

表示,

Δ N i

可以通过比较不同版本的数据量获得,计算所有版本之间的变化量

ΔN Δ N 2, Δ N 3, …, Δ N m

以最新版本之间的变化量占最终版本的比率,作为版本

N m

的变化趋势,用δ表示（式（2））。

δ = ΔN m N m

（2）

δ越大,表明数据量变化程度越大,其数据可信度可能就越低,反之,数据可信度越高。

以旧金山和新德里地区道路数据为例,统计每年道路数据总量的变化,其在2008-2015年的道路长度统计如图1所示。

View original graphic|Download|PPT slide

Fig. 1 The statistics of OSM road data

图1 OSM道路数据统计

由图1可看出,旧金山地区的道路数据经过一次迅速增长后,进入了长期缓慢增长的状态,δ为0.007,表明其道路发展较为成熟,达到了饱和状态,具有较高的可信度;而新德里的道路增长快速,δ为0.105,表明其还处于快速积累阶段,说明道路仍然处于积累发展阶段,可信度相对较低。

综上所述,VGI数据可信度计算过程为：

（1）统计待评价区域内的所有参与用户数量,以m表示;

（2）根据式（1）计算每个用户的信誉度

R i

,并计算所有用户的信誉度之和;

R = ∑ i = 1 m R i

（3）

（3）根据式（2）计算待评价区域的数据变化趋势δ;

（4）根据式（4）计算待评价区域的可信度T。

T = w 1 × R + w 2 × (1 - δ)

（4）

式中：

w 1 + w 2 = 1

,为用户信誉度之和与数据变化趋势的权重。

4 实验与分析

4.1 实验数据

为了验证上述可信度模型与实际数据质量之间的关系,以北京、武汉、上海、深圳城市主城区OpenStreetMap道路数据进行实验,数据来源于OpenStreetMap项目的历史数据库^[27]。每个城市随机选取36 km²实验区域,分析其从2008-2015年的道路变化情况,数据包括主要道路、高速公路、街区小路、步行道路等几何数据（暂时不考虑属性数据）。参考数据为北京四维图新2014年出版的导航数据。为了便于分析比较,将所有的实验区域划分为4 km²的瓦片,4个实验区域一共划分36个瓦片,部分实验区域如图2所示。

View original graphic|Download|PPT slide

Fig. 2 Map of part of the experimental area

图2 部分实验区域图

4.2 实验过程与结果分析

在进行实验时,首先按照上述方法计算每个瓦片的可信度;然后再以导航数据作为参考数据,按照基于参考数据方法对实验区域的数据完整性和精度进行评价^[11];最后将2种评价方法的结果进行比较。

4.2.1 可信度计算

OpenStreetMap历史数据库提供了数据上传以后所有的修改编辑信息,可以从中提取用户名称、修改时间、编辑内容等。首先从OpenStreetMap历史数据库中统计每个瓦片内的用户,并根据用户信誉度计算模型计算这些用户的信誉度,从而得到每个瓦片的用户信誉度之和R。然后,统计每个瓦片范围内道路数据从2008-2015年的道路总长度,根据上述数据变化趋势的计算方法,得到每个瓦片的数据变化趋势δ。将2组数据利用min-max标准化的方法进行线性变换,将结果值映射到[0,1]之间,并根据式（3）计算每个瓦片的可信度T,取

w 1 = w 2 = 0.5

,结果如表1所示。

Tab. 1 The statistics of the credibility of experiment area

表1 实验区域可信度统计表

编号	信誉度和	变化趋势	可信度
1	0.522	0.316	0.730
2	0.382	0.262	0.665
3	0.750	0.238	0.851
4	0.737	0.375	0.833
5	0.168	0.167	0.565
6	0.347	0.238	0.649
7	0.333	0.042	0.659
8	0.517	0.000	0.754
9	0.558	0.250	0.754
…	…	…	…
35	0.933	0.119	0.952
36	0.618	0.054	0.832

4.2.2 基于导航数据的质量评价

在度量道路完整性时,可以依据是否进行匹配分为2种情况：（1）不进行匹配,直接比较VGI道路数据与参考数据道路的总长度;（2）以相匹配的结果占参考数据的比重作为完整性评价。二者从不同角度反映了VGI数据的几何完整性程度：前者表示VGI数据总量与参考数据总量的比值,反映了VGI总量的完整程度,即无论能否与参考数据匹配的要素或地物都包括在内,包括了缺失的和超出的情况;后者表示能够匹配的VGI数据量与参考数据总量的比值,反映了VGI数据与参考数据的接近程度。不能匹配的原因有多种,这些数据既可能是真实存在而参考数据没有记录的（如新增道路）,或者是现实世界消失而参考数据仍然记录的（如消失的道路）,也可能是志愿者恶意上传的虚假数据。本文进行完整性评价是为了对可信度评价结果进行比较,因此采用不匹配的方法^[11-15],以2种数据的道路总长度的比值作为VGI数据的完整性,用C表示。

通过建立缓冲区的方法获得道路的精度。首先将区域内导航道路数据设置20 m的缓冲区,以落在导航道路数据缓冲区的OSM道路比例作为OSM数据的道路精度,用D表示。

按照上述方法,得到每个瓦片的完整性和精度,为了便于与可信度的结果进行比较,按照式（5）,计算每个瓦片的整体质量Q。

Q = w 1 × C + w 2 × D

（5）

式中：取

w 1 = w 2 = 0.5

。按照上述方法逐一计算每个瓦片的质量情况,结果如表2所示。

Tab. 2 The statistics of the data quality of experiment area

表2 实验区域数据质量统计

编号	完整性	精度	质量
1	0.83	0.56	0.695
2	0.79	0.62	0.705
3	0.92	0.73	0.825
4	0.69	0.66	0.675
5	0.87	0.59	0.730
6	0.84	0.82	0.830
7	0.93	0.89	0.910
8	0.74	0.71	0.725
9	0.62	0.56	0.590
…	…	…	…
35	0.81	0.64	0.725
36	0.68	0.71	0.695

4.2.3 结果分析

将各个瓦片的计算结果绘制成散点图（图3）。从图3可以看出,2种方法的评价结果整体保持相同,图中红线为趋势线,其中R²=0.8268,表明2种评价结果基本呈现线性相关的关系。尤其在数据质量低于0.5的区域内,2种评价方法的结果较为一致,表明在数据质量较差时,采用可信度的评价方法能够较好地预测出数据的真实质量;当数据质量较高时,除了少数可信度的评价结果与真实数据质量结果上下波动外,其余结果也能够保持较好的一致性。

View original graphic|Download|PPT slide

Fig. 3 Comparison of the results of the two evaluation methods

图3 2种评价方法结果对比图

5 结论

VGI数据展现出巨大的潜力和价值,但受制于数据质量问题,未能得到广泛应用。由于高质量的参考数据往往难以获取,尤其是境外地区的矢量数据,因此研究基于数据分析的质量评价方法具有重要的理论意义和应用价值。本文通过引入VGI数据可信度的概念,将基于数据分析的定性评价结果进行量化表达,以期能够为用户提供更加直观的评价结果。但是,有关基于数据分析方法评估VGI数据质量的研究仍处于探索阶段,许多问题还有待深入研究。

本文提出的VGI数据可信度及其度量方法,通过深入挖掘VGI历史数据库的一些特征,来反映VGI数据质量。从实验结果看,其能够与基于参考数据的评价结果较好地保持一致。由于OpenStreet Map数据以点为基本单元进行存储,因此本文方法也同样适用于点要素和面要素。但该模型有一定的局限性,仅适用于具有一定的历史版本的数据,对于用户的信誉度评价也要求该用户上传达到一定数量的数据,本文中所做的相关实验只能代表该区域的相关指标。同时,由于属性数据语义关系的复杂性,难以进行变化检测和度量,故本文仅从几何数据的角度进行了分析,但是所提方法同样也可应用于属性数据,这也是后续研究的重点内容。

研究VGI数据的可信度,不仅可以为评价数据质量提供重要参考,还可以为VGI数据的发展提供帮助,指导志愿者后续的上传、发展。在后续研究中,需要重点挖掘新的可信度度量方法。同时,由于数据获取限制,论文所选实验区域均为国内一线城市,OSM数据的精度和完整性较高,在后续的研究中,应该扩大实验范围,选择不同发展程度的实验区域,进一步探索可信度与数据质量之间的关系。

The authors have declared that no competing interests exist.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	张红平,顾学云,熊萍,等.自发地理信息研究应用进展与趋势[J].地理信息世界,2012,8(4):67-71. [ Zhang H P, Gu X Y, Xiong P, et al.Development and application of volunteered geographic information[J]. Geomatics World, 2012,8(4):67-71. ]

[2]

Goodchild M

Citizens as sensors: the world of volunteered geography[J]. GeoJournal,2007,69(4):211-221.

In recent months there has been an explosion of interest in using the Web to create, assemble, and disseminate geographic information provided voluntarily by individuals. Sites such as Wikimapia and OpenStreetMap are empowering citizens to create a global patchwork of geographic information, while Google Earth and other virtual globes are encouraging volunteers to develop interesting applications using their own data. I review this phenomenon, and examine associated issues: what drives people to do this, how accurate are the results, will they threaten individual privacy, and how can they augment more conventional sources? I compare this new phenomenon to more traditional citizen science and the role of the amateur in geographic observation.

DOI

[3]

李德仁,钱新林.浅论自发地理信息的数据管理[J].武汉大学学报·信息科学版,2010,35(4):379-383.

分析了自发地理信息（volunteered geographic information，VGI）数据的来源、分类、特点与管理要求，探讨了VGI数据清理与质量控制，研究了以高效处理绘图查询与数据更新为目标的VGI图形数据管理问题，提出了动态线综合二叉树与缩放四叉树的设计思想，以解决VGI图形数据管理中的难点问题。

[ Li D

, Qian X

A brief introduction of data management for volunteered geographic information[J]. Geomatics and Information Science of Wuhan University, 2010,35(4):379-383.]

[4]

单杰,秦昆,黄长青,等.众源地理数据处理与分析方法探讨[J].武汉大学学报·信息科学版,2014,39(4):390-394. ]

众源地理数据是由大量非专业人员志愿获取并通过互联网向大众或相关机构提供的一种开放地理数据,是有别于传统测绘产品的一种新型地理空间数据。分析和研究了众源地理数据的概念与特点;介绍了众源地理数据的来源和获取方法;讨论了众源地理数据处理与分析的关键技术,包括众源地理数据的质量评价方法,众源地理数据的信息提取与更新方法,众源地理数据的分析与挖掘方法等;指出了众源地理数据处理与分析的研究趋势和发展方向。

DOI

[ Shan J, Qin K, Huang C Q, et al. Methods of crowd sourcing geographic data processing and analysis[J]. Geomatics and Information Science of Wuhan University,2014,39(4):390-394. ]

[5]

Haklay

How good is golunteered geographical information? A comparative study of OpenStreetMap and Ordnance survey satasets[J]. Environment and Planning B: Planning and Design, 2010,37(4):682-703.

Within the framework of Web 2.0 mapping applications, the most striking example of a geographical application is the OpenStreetMap (OSM) project. OSM aims to create a free digital map of the world and is implemented through the engagement of participants in a mode similar to software development in Open Source projects. The information is collected by many participants, collated on a central database, and distributed in multiple digital formats through the World Wide Web. This type of information was termed Volunteered Geographical Information (VGI) by Goodchild, 2007. However, to date there has been no systematic analysis of the quality of VGI. This study aims to fill this gap by analysing OSM information. The examination focuses on analysis of its quality through a comparison with Ordnance Survey (OS) datasets. The analysis focuses on London and England, since OSM started in London in August 2004 and therefore the study of these geographies provides the best understanding of the achievements and difficulties of VGI. The analysis shows that OSM information can be fairly accurate: on average within about 6 m of the position recorded by the OS, and with approximately 80% overlap of motorway objects between the two datasets. In the space of four years, OSM has captured about 29% of the area of England, of which approximately 24% are digitised lines without a complete set of attributes. The implications of the findings to the study of VGI and future research directions are discussed.

DOI

[6]	Ciepluch B, Mooney P.Sketches of generic framework for quality assessment of volunteered geographical data[C]. IEEE Geoscience and Remote Sensing Society, 2011.

[7]	Al-bakri M, Fairbairn D. Assessing the accuracy of “Crowsourced” data and its integration with official spatial data sets[C]. In Proceedings of the Ninth International Symposiums on Spatial Accuracy Assessment in Natural Resources and Environment Sciences, 2010,6:317-320.

[8]	Maco H, Christof A.Comparative spatial analysis of positional accuracy of OpenStreetMap and proprietary geodata[M]. Geovizualisation, Society and Learning. Berlin: Herbert Wichmann Verlag, 2012:24-32.

[9]

Roberto

, Peggy

A photogrammetric approach for assessing positional accuracy of OpenStreetMap Roads[J]. Geo-Information, 2013,2(2):276-301.

As open source volunteered geographic information continues to gain popularity, the user community and data contributions are expected to grow, e.g., CloudMade, Apple, and Ushahidi now provide OpenStreetMap (OSM) as a base layer for some of their mapping applications. This, coupled with the lack of cartographic standards and the expectation to one day be able to use this vector data for more geopositionally sensitive applications, like GPS navigation, leaves potential users and researchers to question the accuracy of the database. This research takes a photogrammetric approach to determining the positional accuracy of OSM road features using stereo imagery and a vector adjustment model. The method applies rigorous analytical measurement principles to compute accurate real world geolocations of OSM road vectors. The proposed approach was tested on several urban gridded city streets from the OSM database with the results showing that the post adjusted shape points improved positionally by 86%. Furthermore, the vector adjustment was able to recover 95% of the actual positional displacement present in the database. To demonstrate a practical application, a head-to-head positional accuracy assessment between OSM, the USGS National Map (TNM), and United States Census Bureau Topologically Integrated Geographic Encoding Referencing (TIGER) 2007 roads was conducted.

DOI

[10]	王明,李清泉,胡庆武,等.面向众源开放街道地图空间数据的质量评价方法[J].武汉大学学报·信息科学版,2013,38(12):1490-1494. [ Wang M, Li Q Q, et al.Quality analysis on crowd sourcing geographic data with OpenStreetMap Data[J]. Geomatics and Information Science of Wuhan University, 2013,38(12):1490-1494. ]

[11]	Zheng S D, Zheng J H.Assessing the completeness and positional accuracy of OpenStreetMap in China[M]. Thematic Cartgraphy for the Society, Springer, 2014.

[12]	KounadI O. Assessing the quality of OpenStreetMap data[D]. London: University College of London, 2009.

[13]	Zielstra D, Zipf A.A comparative study of proprietary geodata and volunteered geographic information for Germany[C]. Proceedings of 13th AGILE international conference on geographic information science, 2010:10-14.

[14]	Koukoletsos T, Haklay M.An automated method to assess data completeness and positional accuracy of OpenStreetMap[C]. GeoComputation, 2011:236-241.

[15]

Mohammad

, Mahmoud

A quality study of the OpenStreetMap dataset for Tehran[J]. Geo-Information, 2014,3(2):750-763.

There has been enormous progress in geospatial data acquisition in the last decade. Centralized data collection, mainly by land surveying offices and local government agencies, has changed dramatically to voluntary data provision by citizens. Among a broad list of initiatives dealing with user generated geospatial information, OpenStreetMap (OSM) is one of the most famous crowd-sourced products. It is believed that the quality of collected information remains a valid concern. Therefore, qualitative assessment of OSM data as the most significant instance of volunteered geospatial information (VGI) is a considerable issue in the geospatial information community. One aspect of VGI quality assessment pertains to its comparison with institutionally referenced geospatial databases. This paper proposes a new quality metric for assessment of VGI accuracy and as well as for quality analysis of OSM dataset by evaluating its consistency with that of a reference map produced by Municipality of Tehran, Iran. A gridded map is employed and heuristic metrics such as Minimum Bounding Geometry area and directional distribution (Standard Deviational Ellipse), evaluated for both VGI and referenced data, are separately compared in each grid. Finally, in order to have a specific output as an integrated quality metric for VGI, its consistency with ground-truth data is evaluated using fuzzy logic. The results of this research verify that the quality of OSM maps in the study area is fairly good, although the spatial distribution of uncertainty in VGI varies throughout the dataset.

DOI

[16]	Haklay M, Basiouka S.How many volunteers does tt make to map an area well? The validity of Linus’s law to volunteered geographic information[J]. The Cartographic Journal, 2010,47(4):315-322.

[17]

Arsanjani J

, Barron

Assessing the quality of OpenStreetMap contributors together with their contributions[J]. AGILE, 2013,5(1):4-17.

In this paper, the volunteers ' contributions to the OpenStreetMap (OSM) project is evaluated based on comparative investigations with administrative data of Germany provided by the Federal Agency for Cartography and Geodesy. Several data quality aspects, including for instance positional accuracy, completeness, and semantic accuracy, are analyzed and compared considering their contributors. Accordingly, several categories of OSM contributors are characterized based on the quantity and quality of their shared data. As such "beginners", "regular mappers", "intermediate mappers", "experts", and "professional mappers " are identified. The categorization of contributors proves the 90-9-1 rule applies in this study as well. A small number of contributors are professional and share the most information accurately, conversely, a large number of contributors contribute to OSM with the least amount of contribution and the minimal quality.

[18]

Christopher

, Pascal

A comprehensive framework for intrinsic OpenStreetMap quality analysis[J]. Transactions in GIS, 2014,18(6):877-895.

ABSTRACT OpenStreetMap (OSM) is one of the most popular examples of a Volunteered Geographic Information (VGI) project. In the past years it has become a serious alternative source for geodata. Since the quality of OSM data can vary strongly, different aspects have been investigated in several scientific studies. In most cases the data is compared with commercial or administrative datasets which, however, are not always accessible due to the lack of availability, contradictory licensing restrictions or high procurement costs. In this investigation a framework containing more than 25 methods and indicators is presented, allowing OSM quality assessments based solely on the data's history. Without the usage of a reference data set, approximate statements on OSM data quality are possible. For this purpose existing methods are taken up, developed further, and integrated into an extensible open source framework. This enables arbitrarily repeatable intrinsic OSM quality analyses for any part of the world.

DOI

[19]	Kebler C, De groot T A. Trust as a proxy measure for the quality of volunteered geographic information in the case of OpenStreetMap[M].Geographic Information Science at the Heart of Europe. Berlin: Springer, 2013:21-37.

[20]

Bishr

, Mantelas

A trust and reputation model for filtering and classifying knowledge about urban growth[J]. GeoJournal, 2008,72(3-4):229-237.

In this paper we present a trust and reputation model to classify and filter collaboratively contributed geographic information. We hypothesize that users contribute information in a collaborative system akin to Web 2.0 collaborative applications. We build on previous work where trust is proposed as a proxy for information quality and propose a spatial trust model to filter and extract high quality information about urban growth behaviors contributed by users. The motivating scenario involves residents of recently urbanized areas taking into account their interactions with their surroundings. The main contribution of this paper is a formal trust and reputation model that takes into account the spatial context of users and their contributions.

DOI

[21]	Mooney P, Padraig C.Charactteristics of heavily edited objects in OpenStreetMap[J]. Future Internet, 2012,4(1):285-305.

[22]

赵肄江,周晓光.地理信息志愿者信誉度评估的版本相似度模型——以面目标为例[J].测绘学报,2015,44(5):578-584.

<p>针对自发地理信息中存在大量恶意、虚假、低质量数据,提出了一种基于版本相似度的VGI志愿者信誉度计算模型。该模型将每个志愿者对某个地理空间目标的一次编辑结果定义为一个版本,当其他用户编辑该目标版本时,通过计算版本相似度来获得其他用户对该志愿者这个目标版本的支持度;然后通过计算其他贡献者对某志愿者所贡献的每个目标的支持度,对所有目标的支持度进行加权平均获得该志愿者的综合信誉度。其中版本相似度综合目标空间和属性相似性的主要因素进行加权计算。然后以面目标为例阐述了本文志愿者信誉度计算过程。为了验证本文信誉度计算模型的合理性,笔者采用德国柏林OpenStreetMap数据进行试验,试验表明通过本文模型计算获得的用户信誉度与其贡献的要素质量总体上正相关。</p>

DOI

[ Zhao Y

, Zhou X

Version similarity-based model for volunteers' reputation of volunteered geographic information: A case study of polygon[J]. Acta Geodaetica et Cartographica Sinica, 2015,44(5):578-584. ]

[23]	熊才权,欧阳勇,梅清.基于可信度的辩论模型及争议评价算法[J].软件学报,2014,25(6):1225-1238. [ Xiong C Q, Ouyang Y, Mei Q.Argumentation model based on certainty-factor and alorithms of argument evaluation[J].Journal of Software, 2014,25(6):1225-1238. ]

[24]

孙素云. Web服务信誉度评估模型的研究[J].计算机工程与设计,2008,29(9):2259-2308.

对Web服务的信誉度、信誉关系属性和信任类型进行了形式化的定义,并据此来评估Web服务的信誉行为,考虑到Web服务信誉度的不确定性,在引入信誉量化概念的基础上,对现有的UDDI规范进行扩展,提出了Web服务信誉度评估模型,并深入研究了Web服务的信誉度评估方法及模型的实现方案,与其它的Web服务信誉度模型相比,该模型通过使用客户反馈、主动监视的机制以及采用第三方权威机构评价Web服务的信誉度,从而保证了信誉度的公平性和有效性。

[ Sun S

Research on reputation evaluation model of web service[J]. Computer Enginnering and Design, 2008, 29(9):2259-2308.]

[25]

Marmol F

, Perez G

Towards pre-standardization of trust and reputation models for distributed and heterogeneous systems[J]. Computer Standards &Interfaces, 2010,32(4):185-196.

Different trust and/or reputation models have arisen in the last few years. All of them have certain key processes in common such as scoring, ranking, rewarding, punishing or gathering behavioral information. However, there is not a standardization effort for these kinds of models. Such effort would be beneficial for distributed systems such as P2P, ad-hoc networks, multi-agent systems or Wireless Sensor Networks. In this paper we present a pre-standardization approach for trust and/or reputation models in distributed systems. A wide review of them has been carried out, extracting common properties and providing some pre-standardization recommendations. A global comparison has been done for the most relevant models against these conditions, and an interface proposal for trust and/or reputation models has been proposed.

DOI

[26]	卢玉清. 用户新信誉度与用户生成内容质量评估模型研究[D].北京:清华大学,2014. [ Lu Y Q.Research on quality evaluation model of user reputation and user generated content[D]. Beijing: Tsinghua University,2014. ]

[27]	OpenStreetMap Full History Data[OB/EL]:.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

1 引言

2 VGI数据可信度的基本特征及影响因素

2.1 可信度与数据质量

2.2 可信度的基本特征

2.3 可信度的影响因素

3 VGI数据可信度的计算

3.1 用户信誉度

3.2 数据变化趋势

Fig. 1 The statistics of OSM road data

4 实验与分析

4.1 实验数据

Fig. 2 Map of part of the experimental area

4.2 实验过程与结果分析

Tab. 1 The statistics of the credibility of experiment area

Tab. 2 The statistics of the data quality of experiment area

Fig. 3 Comparison of the results of the two evaluation methods

5 结论

References