Results 1 - 10
of
101
Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures
"... The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introduci ..."
Abstract
-
Cited by 141 (24 self)
- Add to MetaCart
(Show Context)
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic. 1.
Discovery of Convoys in Trajectory Databases
"... As mobile devices with positioning capabilities continue to proliferate, data management for so-called trajectory databases that capture the historical movements of populations of moving objects becomes important. This paper considers the querying of such databases for convoys, a convoy being a grou ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
(Show Context)
As mobile devices with positioning capabilities continue to proliferate, data management for so-called trajectory databases that capture the historical movements of populations of moving objects becomes important. This paper considers the querying of such databases for convoys, a convoy being a group of objects that have traveled together for some time. More specifically, this paper formalizes the concept of a convoy query using density-based notions, in order to capture groups of arbitrary extents and shapes. Convoy discovery is relevant for reallife applications in throughput planning of trucks and carpooling of vehicles. Although there has been extensive research on trajectories in the literature, none of this can be applied to retrieve correctly exact convoy result sets. Motivated by this, we develop three efficient algorithms for convoy discovery that adopt the wellknown filter-refinement framework. In the filter step, we apply linesimplification techniques on the trajectories and establish distance bounds between the simplified trajectories. This permits efficient convoy discovery over the simplified trajectories without missing any actual convoys. In the refinement step, the candidate convoys are further processed to obtain the actual convoys. Our comprehensive empirical study offers insight into the properties of the paper’s proposals and demonstrates that the proposals are effective and efficient on real-world trajectory data. 1.
Swarm: Mining Relaxed Temporal Moving Object Clusters
"... Recent improvements in positioning technology make massive moving object data widely available. One important analysis is to find the moving objects that travel together. Existing methods put a strong constraint in defining moving object cluster, that they require the moving objects to stick togethe ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
(Show Context)
Recent improvements in positioning technology make massive moving object data widely available. One important analysis is to find the moving objects that travel together. Existing methods put a strong constraint in defining moving object cluster, that they require the moving objects to stick together for consecutive timestamps. Our key observation is that the moving objects in a cluster may actually diverge temporarily and congregate at certain timestamps. Motivatedbythis, wepropose theconceptofswarm which capturesthemovingobjectsthatmovewithinarbitraryshape of clusters for certain timestamps that are possibly nonconsecutive. The goal of our paper is to find all discriminative swarms, namely closed swarm. While the search space for closed swarms is prohibitively huge, we design a method, ObjectGrowth, to efficiently retrieve the answer. In ObjectGrowth, two effective pruning strategies are proposed to greatly reduce the search space and a novel closure checking rule is developed to report closed swarms on-thefly. Empirical studies on the real data as well as large synthetic data demonstrate the effectiveness and efficiency of our methods. 1.
Searching Trajectories by Locations – An Efficiency Study
"... Trajectory search has long been an attractive and challenging topic which blooms various interesting applications in spatial-temporal databases. In this work, we study a new problem of searching trajectories by locations, in which context the query is only a small set of locations with or without an ..."
Abstract
-
Cited by 29 (11 self)
- Add to MetaCart
(Show Context)
Trajectory search has long been an attractive and challenging topic which blooms various interesting applications in spatial-temporal databases. In this work, we study a new problem of searching trajectories by locations, in which context the query is only a small set of locations with or without an order specified, while the target is to find the k Best-Connected Trajectories (k-BCT) from a database such that the k-BCT best connect the designated locations geographically. Different from the conventional trajectory search that looks for similar trajectories w.r.t. shape or other criteria by using a sample query trajectory, we focus on the goodness of connection provided by a trajectory to the specified query locations. This new query can benefit users in many novel applications such as trip planning. In our work, we firstly define a new similarity function for measuring how well a trajectory connects the query locations, with both spatial distance and order constraint being considered. Upon the observation that the number of query locations is normally small (e.g. 10 or less) since it is impractical for a user to input too many locations, we analyze the feasibility of using a general-purpose spatial index to achieve efficient k-BCT search, based on a simple Incremental k-NN based Algorithm (IKNN). The IKNN effectively prunes and refines trajectories by using the devised lower bound and upper bound of similarity. Our contributions mainly lie in adapting the best-first and depth-first k-NN algorithms to the basic IKNN properly, and more importantly ensuring the efficiency in both search effort and memory usage. An in-depth study on the adaption and its efficiency is provided. Further optimization is also presented to accelerate the IKNN algorithm. Finally, we verify the efficiency of the algorithm by extensive experiments.
Towards trajectory anonymization: a generalizationbased approach
- Transactions on Data Privacy
"... Abstract. Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a no ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques.
An Efficient and Accurate Method for Evaluating Time Series Similarity
, 2007
"... A variety of techniques currently exist for measuring the similarity between time series datasets. Of these techniques, the methods whose matching criteria is bounded by a specified ǫ threshold value, such as the LCSS and the EDR techniques, have been shown to be robust in the presence of noise, tim ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
A variety of techniques currently exist for measuring the similarity between time series datasets. Of these techniques, the methods whose matching criteria is bounded by a specified ǫ threshold value, such as the LCSS and the EDR techniques, have been shown to be robust in the presence of noise, time shifts, and data scaling. Our work proposes a new algorithm, called the Fast Time Series Evaluation (FTSE) method, which can be used to evaluate such threshold value techniques, including LCSS and EDR. Using FTSE, we show that these techniques can be evaluated faster than using either traditional dynamic programming or even warp-restricting methods such as the Sakoe-Chiba band and the Itakura Parallelogram. We also show that FTSE can be used in a framework that can evaluate a richer range of ǫ threshold-based scoring techniques, of which EDR and LCSS are just two examples. This framework, called Swale, extends the ǫ threshold-based scoring techniques to include arbitrary match rewards and gap penalties. Through extensive empirical evaluation, we show that Swale can obtain greater accuracy than existing methods.
Approximate embedding-based subsequence matching of time series
- In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
, 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to brute-force search, with very small losses (< 1%) in retrieval accuracy.
Experimental comparison of representation methods and distance measures for time series data
- Data Mining and Knowledge Discovery
"... ar ..."
(Show Context)
Shapes based Trajectory Queries for Moving Objects
- Proceedings of ACM GIS
, 2005
"... An interesting issue in moving objects databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories without time dimension) of their trajectories. I ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
(Show Context)
An interesting issue in moving objects databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories without time dimension) of their trajectories. In this paper we propose a simple and effective way to compare spatial shapes of moving object trajectories. We introduce a new distance function based on “one way distance” (OWD). Algorithms for evaluating OWD in both continuous (piece wise linear) and discrete (grid representation) cases are developed. An index structure for OWD in grid representation, which guarantees no false dismissals, is also given to improve the efficiency of similarity search. Empirical studies show that OWD out-performs existent methods not only in precision, but also in efficiency. And the results of OWD in continuous case can be approximated by discrete case efficiently.
C.: Outlier detection by sampling with accuracy guarantees
- In: KDD (2006
"... An effective approach to detect anomalous points in a data set is distance-based outlier detection. This paper describes a simple sampling algorithm to efficiently detect distancebased outliers in domains where each and every distance computation is very expensive. Unlike any existing algorithms, th ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
(Show Context)
An effective approach to detect anomalous points in a data set is distance-based outlier detection. This paper describes a simple sampling algorithm to efficiently detect distancebased outliers in domains where each and every distance computation is very expensive. Unlike any existing algorithms, the sampling algorithm requires a fixed number of distance computations and can return good results with accuracy guarantees. The most computationally expensive aspect of estimating the accuracy of the result is sorting all of the distances computed by the sampling algorithm. This enables interactive-speed performance over the most expensive distance computations. The paper’s algorithms were tested over two domains that require expensive distance functions as well as ten additional real data sets. The experimental study demonstrates both the efficiency and effectiveness of the sampling algorithm in comparison with the state-of-theart algorithm and the reliability of the accuracy guarantees. 1.