• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

On the marriage of Lp-norms and edit distance. (2004)

by L Chen, R T Ng
Venue:In VLDB,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 101
Next 10 →

Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures

by Hui Ding, Goce Trajcevski, Xiaoyue Wang, Eamonn Keogh
"... The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introduci ..."
Abstract - Cited by 141 (24 self) - Add to MetaCart
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic. 1.
(Show Context)

Citation Context

...e series data in the literature, e.g., Euclidean distance (ED) [13], Dynamic Time Warping (DTW) [5, 26], distance based on Longest Common Subsequence (LCSS) [39],Edit Distance with Real Penalty (ERP) =-=[8]-=-, Edit Distance on Real sequence (EDR) [9], DISSIM [14], Sequence Weighted Alignment model (Swale) [31], Spatial Assembling Distance (SpADe) [12] and similarity search based on Threshold Queries (TQuE...

Discovery of Convoys in Trajectory Databases

by Hoyoung Jeung, Man Lung, Yiu Xiaofang, Zhou Christian, S. Jensen, Heng Tao Shen
"... As mobile devices with positioning capabilities continue to proliferate, data management for so-called trajectory databases that capture the historical movements of populations of moving objects becomes important. This paper considers the querying of such databases for convoys, a convoy being a grou ..."
Abstract - Cited by 54 (3 self) - Add to MetaCart
As mobile devices with positioning capabilities continue to proliferate, data management for so-called trajectory databases that capture the historical movements of populations of moving objects becomes important. This paper considers the querying of such databases for convoys, a convoy being a group of objects that have traveled together for some time. More specifically, this paper formalizes the concept of a convoy query using density-based notions, in order to capture groups of arbitrary extents and shapes. Convoy discovery is relevant for reallife applications in throughput planning of trucks and carpooling of vehicles. Although there has been extensive research on trajectories in the literature, none of this can be applied to retrieve correctly exact convoy result sets. Motivated by this, we develop three efficient algorithms for convoy discovery that adopt the wellknown filter-refinement framework. In the filter step, we apply linesimplification techniques on the trajectories and establish distance bounds between the simplified trajectories. This permits efficient convoy discovery over the simplified trajectories without missing any actual convoys. In the refinement step, the candidate convoys are further processed to obtain the actual convoys. Our comprehensive empirical study offers insight into the properties of the paper’s proposals and demonstrates that the proposals are effective and efficient on real-world trajectory data. 1.
(Show Context)

Citation Context

...minimized. More recent proposals for trajectory distance functions include Longest Common Subsequence (LCSS) [15], Edit Distance on Real Sequence (EDR) [10], and Edit distance with Real Penalty (ERP) =-=[9]-=-. Lee et al. [20] point out that the above distance measures capture the global similarity between two trajectories, but not their local similarity during a short time interval. Thus, these measures c...

Swarm: Mining Relaxed Temporal Moving Object Clusters

by Zhenhui Li, Bolin Ding, Jiawei Han Rol
"... Recent improvements in positioning technology make massive moving object data widely available. One important analysis is to find the moving objects that travel together. Existing methods put a strong constraint in defining moving object cluster, that they require the moving objects to stick togethe ..."
Abstract - Cited by 37 (11 self) - Add to MetaCart
Recent improvements in positioning technology make massive moving object data widely available. One important analysis is to find the moving objects that travel together. Existing methods put a strong constraint in defining moving object cluster, that they require the moving objects to stick together for consecutive timestamps. Our key observation is that the moving objects in a cluster may actually diverge temporarily and congregate at certain timestamps. Motivatedbythis, wepropose theconceptofswarm which capturesthemovingobjectsthatmovewithinarbitraryshape of clusters for certain timestamps that are possibly nonconsecutive. The goal of our paper is to find all discriminative swarms, namely closed swarm. While the search space for closed swarms is prohibitively huge, we design a method, ObjectGrowth, to efficiently retrieve the answer. In ObjectGrowth, two effective pruning strategies are proposed to greatly reduce the search space and a novel closure checking rule is developed to report closed swarms on-thefly. Empirical studies on the real data as well as large synthetic data demonstrate the effectiveness and efficiency of our methods. 1.
(Show Context)

Citation Context

... Many methods have been proposed, such as Dynamic Time Warping (DTW) [24], Longest Common Subsequences (LCSS) [20], Edit Distance on Real Sequence (EFR) [6], and Edit distance with Real Penalty (ERP) =-=[5]-=-. Gaffney et al. [8] propose trajectory clustering methods based on probabilistic modeling of a set of trajectories. As pointed out in Lee et al. [17], distance measure established on whole trajectori...

Searching Trajectories by Locations – An Efficiency Study

by Zaiben Chen, Heng Tao Shen, Xiaofang Zhou, Yu Zheng, Xing Xie
"... Trajectory search has long been an attractive and challenging topic which blooms various interesting applications in spatial-temporal databases. In this work, we study a new problem of searching trajectories by locations, in which context the query is only a small set of locations with or without an ..."
Abstract - Cited by 29 (11 self) - Add to MetaCart
Trajectory search has long been an attractive and challenging topic which blooms various interesting applications in spatial-temporal databases. In this work, we study a new problem of searching trajectories by locations, in which context the query is only a small set of locations with or without an order specified, while the target is to find the k Best-Connected Trajectories (k-BCT) from a database such that the k-BCT best connect the designated locations geographically. Different from the conventional trajectory search that looks for similar trajectories w.r.t. shape or other criteria by using a sample query trajectory, we focus on the goodness of connection provided by a trajectory to the specified query locations. This new query can benefit users in many novel applications such as trip planning. In our work, we firstly define a new similarity function for measuring how well a trajectory connects the query locations, with both spatial distance and order constraint being considered. Upon the observation that the number of query locations is normally small (e.g. 10 or less) since it is impractical for a user to input too many locations, we analyze the feasibility of using a general-purpose spatial index to achieve efficient k-BCT search, based on a simple Incremental k-NN based Algorithm (IKNN). The IKNN effectively prunes and refines trajectories by using the devised lower bound and upper bound of similarity. Our contributions mainly lie in adapting the best-first and depth-first k-NN algorithms to the basic IKNN properly, and more importantly ensuring the efficiency in both search effort and memory usage. An in-depth study on the adaption and its efficiency is provided. Further optimization is also presented to accelerate the IKNN algorithm. Finally, we verify the efficiency of the algorithm by extensive experiments.
(Show Context)

Citation Context

... typical similarity functions for different applications include Euclidean Distance [2], Dynamic Time Warping (DTW) [22], Longest Common Subsequence (LCSS) [21], Edit Distance with Real Penalty (ERP) =-=[7]-=-, Edit Distance on Real Sequences (EDR) [8], and enhanced techniques for evaluating the similarity of time series are also studied in [16, 20]. The pioneering work by Agrawal et al. in [2] utilizes Di...

Towards trajectory anonymization: a generalizationbased approach

by Mehmet Ercan Nergiz, Maurizio Atzori, Yücel Saygın - Transactions on Data Privacy
"... Abstract. Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a no ..."
Abstract - Cited by 28 (2 self) - Add to MetaCart
Abstract. Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques.
(Show Context)

Citation Context

...st the clusters obtained from the sanitized dataset (response partition). For the evaluation, we used a bottom-up completelink agglomerative clustering algorithm, coupled with the ERP distance metric =-=[11]-=-, which has been specifically developed for trajectories. As the algorithm requires to specify the number of clusters as input, we experimented with a range of 2 to 60 clusters. Our hierarchical clust...

An Efficient and Accurate Method for Evaluating Time Series Similarity

by Michael Morse, Jignesh M. Patel , 2007
"... A variety of techniques currently exist for measuring the similarity between time series datasets. Of these techniques, the methods whose matching criteria is bounded by a specified ǫ threshold value, such as the LCSS and the EDR techniques, have been shown to be robust in the presence of noise, tim ..."
Abstract - Cited by 25 (1 self) - Add to MetaCart
A variety of techniques currently exist for measuring the similarity between time series datasets. Of these techniques, the methods whose matching criteria is bounded by a specified ǫ threshold value, such as the LCSS and the EDR techniques, have been shown to be robust in the presence of noise, time shifts, and data scaling. Our work proposes a new algorithm, called the Fast Time Series Evaluation (FTSE) method, which can be used to evaluate such threshold value techniques, including LCSS and EDR. Using FTSE, we show that these techniques can be evaluated faster than using either traditional dynamic programming or even warp-restricting methods such as the Sakoe-Chiba band and the Itakura Parallelogram. We also show that FTSE can be used in a framework that can evaluate a richer range of ǫ threshold-based scoring techniques, of which EDR and LCSS are just two examples. This framework, called Swale, extends the ǫ threshold-based scoring techniques to include arbitrary match rewards and gap penalties. Through extensive empirical evaluation, we show that Swale can obtain greater accuracy than existing methods.

Approximate embedding-based subsequence matching of time series

by Vassilis Athitsos, Panagiotis Papapetrou, Michalis Potamias, George Kollios, Dimitrios Gunopulos - In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data , 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key ..."
Abstract - Cited by 21 (6 self) - Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to brute-force search, with very small losses (< 1%) in retrieval accuracy.

Experimental comparison of representation methods and distance measures for time series data

by Xiaoyue Wang, Hui Ding, Goce Trajcevski, Peter Scheuermann, Eamonn Keogh, Xiaoyue Wang, Hui Ding, Goce Trajcevski, Peter Scheuermann, Eamonn Keogh - Data Mining and Knowledge Discovery
"... ar ..."
Abstract - Cited by 18 (4 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...es presented in the literature, e.g., Euclidean distance (ED) [18], Dynamic Time Warping (DTW) [10,35], distance based on Longest Common Subsequence (LCSS) [61], Edit Distance with Real Penalty (ERP) =-=[13]-=-, Edit Distance on Real sequence (EDR) [14], DISSIM [20], Sequence Weighted Alignment model (Swale) [45], Spatial Assembling Distance (SpADe) [17] and similarity search based on Threshold Queries (TQu...

Shapes based Trajectory Queries for Moving Objects

by Bin Lin, Jianwen Su - Proceedings of ACM GIS , 2005
"... An interesting issue in moving objects databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories without time dimension) of their trajectories. I ..."
Abstract - Cited by 17 (0 self) - Add to MetaCart
An interesting issue in moving objects databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories without time dimension) of their trajectories. In this paper we propose a simple and effective way to compare spatial shapes of moving object trajectories. We introduce a new distance function based on “one way distance” (OWD). Algorithms for evaluating OWD in both continuous (piece wise linear) and discrete (grid representation) cases are developed. An index structure for OWD in grid representation, which guarantees no false dismissals, is also given to improve the efficiency of similarity search. Empirical studies show that OWD out-performs existent methods not only in precision, but also in efficiency. And the results of OWD in continuous case can be approximated by discrete case efficiently.
(Show Context)

Citation Context

...This paper aims at similarity search for trajectories. Similarity search is not a new topic and has been investigated in various context, e.g., motion tracking in videos [10, 8], time series analysis =-=[6, 2, 15, 14, 7, 3, 12]-=-, and recently, trajectories, [16, 14, 9, 18, 17, 4, 11]. Partly due to the difficulty in formalizing “similarity” and partly due to the diversity of applications, known results aren’t satisfactory in...

C.: Outlier detection by sampling with accuracy guarantees

by Mingxi Wu, Christopher Jermaine - In: KDD (2006
"... An effective approach to detect anomalous points in a data set is distance-based outlier detection. This paper describes a simple sampling algorithm to efficiently detect distancebased outliers in domains where each and every distance computation is very expensive. Unlike any existing algorithms, th ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
An effective approach to detect anomalous points in a data set is distance-based outlier detection. This paper describes a simple sampling algorithm to efficiently detect distancebased outliers in domains where each and every distance computation is very expensive. Unlike any existing algorithms, the sampling algorithm requires a fixed number of distance computations and can return good results with accuracy guarantees. The most computationally expensive aspect of estimating the accuracy of the result is sorting all of the distances computed by the sampling algorithm. This enables interactive-speed performance over the most expensive distance computations. The paper’s algorithms were tested over two domains that require expensive distance functions as well as ten additional real data sets. The experimental study demonstrates both the efficiency and effectiveness of the sampling algorithm in comparison with the state-of-theart algorithm and the reliability of the accuracy guarantees. 1.
(Show Context)

Citation Context

...topher Jermaine Department of Computer and Information Sciences and Engineering University of Florida Gainesville, FL, USA, 32611 cjermain@cise.ufl.edu [12], the ERP distance function for time series =-=[3]-=-, the quadraticform distance function for color histograms [5] and various scoring matrices for aligning bioinformatics sequences [4] all are computationally expensive. Given two input points that are...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University