MFSTR-tree:面向Argo海洋浮标的时空数据索引
作者简介:杨明远(1992-),男,硕士生,地图制图学与地理信息系统专业,主要研究方向为数字地图制图。E-mail: 18695800631@163.com
收稿日期: 2017-08-31
要求修回日期: 2018-02-28
网络出版日期: 2018-05-20
基金资助
国家自然科学基金项目(41501446)
地理信息工程国家重点实验室开放基金项目(SKLGIE2015-M-4-3)
MFSTR-tree:An Spatio-temporal Index for Argo Floats Data
Received date: 2017-08-31
Request revised date: 2018-02-28
Online published: 2018-05-20
Supported by
National Natural Science Foundation of China, No.41501446
Open Research Fund Program of State Key Laboratory of Geoinformation Engineering, No.SKLGIE2015-M-4-3.
Copyright
针对Argo海洋浮标数据的准实时性、海量性、时空异变性等特点和多种查询应用需求,分析了当前时空索引方法的优势与不足,提出了一种多频率STR-tree索引与格网索引的混合索引结构MFSTR-tree。该方法在首先轨迹束层利用动态轨迹束作为叶节点生成STR-tree结构,将STR-tree索引结构灵活、数据冗余少的优势进一步扩大;接着通过轨迹束的多种频率在采样点层构建格网索引,实现在查询效率上的提升;同时给出了该结构插入算法和查询算法的具体描述。本文以中国Argo实时资料中心提供的2015年海洋浮标数据为例,将该方法与HR-tree和STR-tree方法进行了构建效率和查询效率的对比实验,结果表明该方法在保证了构建存储效率和时间效率的同时,有效改善了原有STR-tree应用于Argo数据中的查询效率问题。
杨明远 , 刘海砚 , 朱新铭 , 苏晨琛 . MFSTR-tree:面向Argo海洋浮标的时空数据索引[J]. 地球信息科学学报, 2018 , 20(5) : 665 -673 . DOI: 10.12082/dqxxkx.2018.170405
Management mode of relational database and file type database is difficult to support the data of Argo floats data. Because Argo buoy floats freely with sea current and data volume is huge in scale as the mobile object. In view of the quasi-real-time, massive nature, spatio-temporal variation and other characteristics of Argo ocean buoy data, as well as multiple query application requirements, the advantages and disadvantages of the current space-time indexing method are analyzed. The deficiencies of current spatio-temporal indexing method include: (1) The Argo buoy data has large volume and are observed over a long time span. When a STR-tree index is established on the trajectory of a buoy, the over-long trajectory tends to lead to a high overlap ratio between the MBB of the STR-tree, which further leads to the reduction of search efficiency. (2) The Argo floats data are sampled at different frequencies and with relatively stable frequency, but the influence of the update frequency on index structure optimization is often ignored. Therefore, a hybrid index structure called MFSTR-tree with multi-frequency STR-tree index and grid index is proposed. First, the dynamic trajectory beam is used as the leaf node to generate the STR-tree structure in the trajectory beam layer, taking advantage of the flexibility and the less data redundancy of the STR-tree index structure. Then, the improvement of query efficiency is realized by use of the multiple frequencies of the track beam at the sampling point layer according to the construction grid index. The corresponding interpolation algorithm and query algorithm are described in this paper. To verify the construction and query efficiency of MFSTR-tree, a comparison experiment was conducted with HR-tree and STR-tree for Argo floats in 2015 from China Argo real-time data center. The experimental results show that under the premise of guaranteeing construction time efficiency and storage efficiency, HR-tree still maintains natural advantages in single-time query and is much more efficient than the other two. After optimization, the MFSTR-tree had efficiency of 40% higher than the general STR-tree. The query efficiency of the HR-tree decreases significantly with the query window size expanded to 4% of the total range. MFSTR-tree is further optimized on the basis of the original STR-tree, which improves the efficiency of the sampling point selection process in the trajectory bundle. Therefore, the advantage is more obvious, and the verification of the algorithm is realized.
Fig. 1 Construction of MFSTR-tree index图1 MFSTR-tree索引结构 |
Fig. 2 Insertion of trajectory bundles in MFSTR-tree index图2 MFSTR-tree轨迹束插入 |
Fig. 3 Schematic diagram of query algorithm图3 采样点筛选过程示意图 |
算法1伪代码:ChooseLeaf(叶节点选择) |
---|
Input:Path,STR-tree Output:InsertNode Begin TNode←Root :目标节点TNode初始化为根节点 minExtChild.Clear( ) :minExtChild清空,minExtChild为TNode的子节点中插入轨迹束后MBB扩展最小的子节点 while TNode.child != NULL do:TNode有子节点时执行下列操作 minExtChild←TNode:minExtChild初始化为当前操作节点 for each (child in TNode ):对TNode的子节点进行遍历 if (minExtChild.Add(Path).MBB-minExtChild.MBB)> (child.Add(Path).MBB - child.MBB) :选择扩张最小的子节点child minExtChild←child:child赋值给minExtChild TNode←minExtChild:目标节点TNode更新为minExtChild,实现向下的遍历 return TNode:返回TNode得到最终扩张最小的叶节点 End |
算法2伪代码:AdjustTree(树结构调整) |
---|
Input:STR-tree, InsertNode, p Output:STR-tree Begin TNode←InsertNode:TNode为目标节点,初始化为插入节点InsertNode while TNode.father != NULL do:TNode父节点不为空时,执行下列操作 TNode.father.ExtensionMBB( ):调整TNode父节点的外界矩形MBB if TNode.child>p:如果TNode插入轨迹束后子节点数量大于阈值p TNode.SplitNode( ):TNode分裂 TNode←TNode.father:TNode更新为TNode的父节点,实现向上的递归 End |
Tab. 1 Efficiency comparison of index generation表1 索引构建效率对比结果 |
索引类型 | 3DR-tree节点数目 | 存储空间/KB | 采样点插补访问节点数 | 新轨迹束插入访问节点数 | 节点分裂次数 | 构建用时/s |
---|---|---|---|---|---|---|
HR-tree | 165 247 | 280 883 | 0 | 890 168 | 29 241 | 2.031 |
STR-tree | 29 707 | 276 437 | 4 621 861 | 167 889 | 6529 | 3.122 |
MFSTR-tree | 25 991 | 276 279 | 3 910 859 | 145 595 | 5678 | 2.783 |
Tab. 2 Efficiency comparison of time slices queries表2 单时间切片查询实验结果 |
索引类型 | 查询空间范围/% | 访问节点次数 | MBB求交次数 |
---|---|---|---|
HR-tree | 1 | 37 | 207 |
10 | 104 | 511 | |
20 | 207 | 966 | |
STR-tree | 1 | 93 | 464 |
10 | 635 | 1311 | |
20 | 1569 | 2645 | |
MFSTR-tree | 1 | 64 | 354 |
10 | 221 | 791 | |
20 | 469 | 1405 |
Fig. 4 Efficiency comparison of time slices queries图4 单时间切片查询效率对比 |
Fig. 5 Efficiency comparison of spatial-temporal range queries图5 时空范围查询效率对比 |
Tab. 3 Efficiency comparison of spatio-temporal range queries表3 时空范围查询实验结果 |
索引类型 | 查询时空范围/% | 访问节点次数 | MBB求交次数 |
---|---|---|---|
HR-tree | 1 | 39 | 216 |
10 | 811 | 1458 | |
20 | 3687 | 4989 | |
STR-tree | 1 | 96 | 474 |
10 | 650 | 1332 | |
20 | 1874 | 3475 | |
MFSTR-tree | 1 | 67 | 370 |
10 | 349 | 1107 | |
20 | 1076 | 2694 |
The authors have declared that no competing interests exist.
[1] |
[
|
[2] |
|
[3] |
|
[4] |
|
[5] |
[
|
[6] |
[
|
[7] |
[
|
[8] |
|
[9] |
[
|
[10] |
[
|
[11] |
[
|
[12] |
[
|
[13] |
[
|
[14] |
[
|
[15] |
|
[16] |
[
|
[17] |
[
|
[18] |
|
[19] |
|
[20] |
[
|
/
〈 | 〉 |