A Parallel Algorithm for Detecting Trajectory Outliers Based on MapReduce

Expand
  • School of Computer Science and Technology, Nanjing Normal University, Nanjing 210023, China

Received date: 2014-11-13

  Revised date: 2014-12-11

  Online published: 2015-05-10

Abstract

Trajectory outlier detection is significantly important in the field of data mining for moving object. TRAOD (TRAjectory Outlier Dectection Algorithm), a classic algorithm for detecting trajectory outliers, focuses on a new two-level trajectory partitioning strategy to enhance the efficiency of algorithm. The main advantage of TRAOD algorithm is the ability to detect outlying sub-trajectories. However, it has a low efficiency on abnormality detection for massive trajectory data. In order to improve the efficiency for mining trajectory outliers from massive datasets, a parallel algorithm for detecting trajectory outliers based on MapReduce framework, which is called PTRAOD (Parallel algorithm for TRAjectory Outlier Detection), is presented. It redesigns the TRAOD algorithm based on the MapReduce framework, and encapsulates the steps of TRAOD into its Map and Reduce functions. PTRAOD algorithm takes full advantages of the features from Hadoop platform. It firstly distributes the trajectory data into distributed computing nodes. While distributing the data, it also takes the load-balance into consideration. And after all, each node runs the same algorithms to detect abnormal trajectories. Based on PTRAOD algorithm, a grid-based parallel algorithm for detecting trajectory outliers, called GPTRAOD (Gridbased Parallel algorithm for TRAjectory Outlier Detection), is then proposed. GPTRAOD algorithm makes use of the grid index to realize regional query and reduce unnecessary calculations. At first, GPTRAOD algorithm divides the map into a series of equal- sized grids, whose size is determined with respect to each specific data. Then, the grid index is established to implement the regional query. Finally, the algorithm finds out the abnormal trajectory segments and judges whether the trajectories that contains the abnormal trajectory segments are abnormal. In general, GPTRAOD algorithm takes advantages of the gird index to realize regional query on the basis of PTRAOD algorithm, which furthermore can search abnormal trajectory on the cloud computing platform. To assess the performances of the proposed algorithms, extensive experiments were conducted. The experimental results demonstrate that the proposed two parallel detection algorithms can both successfully achieve the trajectory outlier detection. The efficiency of PTRAOD algorithm is higher than TRAOD algorithm, while GPTRAOD algorithm has the higher scalability and better speedup ratio than PTRAOD algorithm. In addition, with the rapidly expanding of datasets, GPTRAOD algorithm shows obvious advantages and increasing potentials.

Cite this article

TANG Mengmeng, JI Genlin, ZHAO Bin . A Parallel Algorithm for Detecting Trajectory Outliers Based on MapReduce[J]. Journal of Geo-information Science, 2015 , 17(5) : 523 -530 . DOI: 10.3724/SP.J.1047.2015.00523

References

[1] Liao L, Patterson D J, Fox D, et al. Learning and inferring transportation routines[J]. Artificial Intelligence, 2007,171(5):311-331.
[2] Yuan J, Zheng Y, Zhang C, et al. T-drive: Driving directions based on taxi trajectories[C]. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York: ACM, 2010:99-108.
[3] Giannotti F, Nanni M, Pinelli F, et al. Trajectory pattern mining[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2007:330-339.
[4] Yu Y, Cao L, Rundensteiner E A, et al. Detecting moving object outliers in massive-scale trajectory streams[C]. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014:422-431.
[5] Li X, Li Z, Han J, et al. Temporal outlier detection in vehicle traffic data[C]. In: Proceedings of the 25th International Conference on Data Engineering. Washington: IEEE Computer Society, 2009:1319-1322.
[6] Ge Y, Xiong H, Liu C, et al. A taxi driving fraud detection system[C]. 2011 IEEE 11th International Conference on Data Mining (ICDM), 2011:181-190.
[7] Barnett V, Lewis T. Outliers in statistical data[M]. New York:Wiley, 1994.
[8] Knorr E M, Ng R T. Finding intentional knowledge of distance-based outliers[C]. In: Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland: Springer-Verlag, 1999,99:211-222.
[9] Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying density-based local outliers[C]. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2000,29(2):93-104.
[10] Aggarwal C C, Yu P S. Outlier detection for high dimensional data[C]. In: Proceedings of 2001 ACM SIGMOD International Conference on Management of Data. California: ACM, 2001,30(2):37-46.
[11] 姜金凤.移动对象轨道异常检测算法的研究[D].南京:南 京航空航天大学,2010:1-19.
[12] Knorr E M, Ng R T, Tucakov V. Distance-based outliers: algorithms and applications[J]. The VLDB Journal¾The International Journal on Very Large Data Bases, 2000,8(3-4):237-253.
[13] Li X, Han J, Kim S, et al. ROAM: Rule-and motif-based anomaly detection in massive moving object data sets[C]. In: Proceedings of the SIAM International Conference on Data Mining. Minneapolis. Minnesota: SIAM, 2007,7: 273-284.
[14] 刘良旭,乔少杰,刘宾,等.基于R-Tree 的高效异常轨迹检 测算法[J].软件学报,2009,20(9):2426-2435.
[15] Lee J G, Han J, Li X. Trajectory outlier detection: A partition-and-detect framework[C]. In: Proceedings of the 24th International Conference on Data Engineering. Washington: IEEE Computer Society, 2008:140-149.
[16] Liu L, Fan J, Qiao S, et al. Efficiently mining outliers from trajectories of unrestraint movement[C]. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, 2010,2:V2-261-V2-265.
[17] Isaksson C, Dunham M H. A comparative study of outlier detection algorithms[M]. In: Machine Learning and Data Mining in Pattern Recognition. Berlin: Springer Berlin Heidelberg, 2009:440-453.
[18] Armbrust M, Fox A, Griffith R, et al. A view of cloud computing[J]. Communications of the ACM, 2010,53(4): 50-58.
[19] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008,51(1):107-113.
[20] Dean J, Ghemawat S. MapReduce: A flexible data processing tool[J]. Communications of the ACM, 2010,53(1): 72-77.
[21] Lee J G, Han J, Whang K Y. Trajectory clustering: A partition-and-group framework[C]. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, 2007:593-604.
[22] UNISYS. The Atlantic hurricane database[EB/OL]. [2014-9-10]. http://weather.unisys.com/hurricane/atlantic/.

Outlines

/