Results 1 - 10
of
25
Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures
"... The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introduci ..."
Abstract
-
Cited by 141 (24 self)
- Add to MetaCart
(Show Context)
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic. 1.
Searching Trajectories by Locations – An Efficiency Study
"... Trajectory search has long been an attractive and challenging topic which blooms various interesting applications in spatial-temporal databases. In this work, we study a new problem of searching trajectories by locations, in which context the query is only a small set of locations with or without an ..."
Abstract
-
Cited by 29 (11 self)
- Add to MetaCart
(Show Context)
Trajectory search has long been an attractive and challenging topic which blooms various interesting applications in spatial-temporal databases. In this work, we study a new problem of searching trajectories by locations, in which context the query is only a small set of locations with or without an order specified, while the target is to find the k Best-Connected Trajectories (k-BCT) from a database such that the k-BCT best connect the designated locations geographically. Different from the conventional trajectory search that looks for similar trajectories w.r.t. shape or other criteria by using a sample query trajectory, we focus on the goodness of connection provided by a trajectory to the specified query locations. This new query can benefit users in many novel applications such as trip planning. In our work, we firstly define a new similarity function for measuring how well a trajectory connects the query locations, with both spatial distance and order constraint being considered. Upon the observation that the number of query locations is normally small (e.g. 10 or less) since it is impractical for a user to input too many locations, we analyze the feasibility of using a general-purpose spatial index to achieve efficient k-BCT search, based on a simple Incremental k-NN based Algorithm (IKNN). The IKNN effectively prunes and refines trajectories by using the devised lower bound and upper bound of similarity. Our contributions mainly lie in adapting the best-first and depth-first k-NN algorithms to the basic IKNN properly, and more importantly ensuring the efficiency in both search effort and memory usage. An in-depth study on the adaption and its efficiency is provided. Further optimization is also presented to accelerate the IKNN algorithm. Finally, we verify the efficiency of the algorithm by extensive experiments.
Approximate embedding-based subsequence matching of time series
- In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
, 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to brute-force search, with very small losses (< 1%) in retrieval accuracy.
Experimental comparison of representation methods and distance measures for time series data
- Data Mining and Knowledge Discovery
"... ar ..."
(Show Context)
Indexable PLA for Efficient Similarity Search
"... Similarity-based search over time-series databases has been a hot research topic for a long history, which is widely used in many applications, including multimedia retrieval, data mining, web search and retrieval, and so on. However, due to high dimensionality (i.e. length) of the time series, the ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Similarity-based search over time-series databases has been a hot research topic for a long history, which is widely used in many applications, including multimedia retrieval, data mining, web search and retrieval, and so on. However, due to high dimensionality (i.e. length) of the time series, the similarity search over directly indexed time series usually encounters a serious problem, known as the “dimensionality curse”. Thus, many dimensionality reduction techniques are proposed to break such curse by reducing the dimensionality of time series. Among all the proposed methods, only Piecewise Linear Approximation (PLA) does not have indexing mechanisms to support similarity queries, which prevents it from efficiently searching over very large timeseries databases. Our initial studies on the effectiveness of different reduction methods, however, show that PLA performs no worse than others. Motivated by this, in this paper, we re-investigate PLA for approximating and indexing time series. Specifically, we propose a novel distance function in the reduced PLA-space, and prove that this function indeed results in a lower bound of the Euclidean distance between the original time series, which can lead to no false dismissals during the similarity search. As a second step, we develop an effective approach to index these lower bounds to improve the search efficiency. Our extensive experiments over a wide spectrum of real and synthetic data sets have demonstrated the efficiency and effectiveness of PLA together with the newly proposed lower bound distance, in terms of both pruning power and wall clock time, compared with two stateof-the-art reduction methods, Adaptive Piecewise Constant Approximation (APCA) and Chebyshev Polynomials (CP).
Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound
, 2009
"... The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensi ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive lower bound (LB Keogh). We compare LB Keogh with a tighter lower bound (LB Improved). We find that LB Improved-based search is faster. As an example, our approach is 2–3 times faster over random-walk and shape time series.
On Nonmetric Similarity Search Problems in Complex Domains
, 2010
"... The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a query. A popular type of such a mechanism is similarity querying. For a long time, the database-oriented applications of similarity search employed the definition of similarity restricted to metric distances. Due to its topological properties, metric similarity can be effectively used to index a database which can be then queried efficiently by so-called metric access methods. However, together with the increasing complexity of data entities across various domains, in recent years there appeared many similarities that were not metrics – we call them nonmetric similarity functions. In this paper we survey domains employing nonmetric functions for effective similarity search, and methods for efficient nonmetric similarity search. First, we show that the ongoing research in many of these domains requires complex representations of data entities. Simultaneously, such complex representations allow us to model also complex and computationally expensive similarity functions (often represented by various matching algorithms). However, the more complex similarity function one develops, the more likely it will be a nonmetric. Second, we review the state-of-the-art techniques for efficient (fast) nonmetric similarity search, concerning both exact and approximate search. Finally, we discuss some open problems and possible future research trends.
Weighted dynamic time warping for time series classification,”
- Pattern Recognition,
, 2011
"... a b s t r a c t Dynamic time warping (DTW), which finds the minimum path by providing non-linear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phas ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
a b s t r a c t Dynamic time warping (DTW), which finds the minimum path by providing non-linear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition. Therefore, we propose a novel distance measure, called a weighted DTW (WDTW), which is a penaltybased DTW. Our approach penalizes points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion caused by outliers. The rationale underlying the proposed distance measure is demonstrated with some illustrative examples. A new weight function, called the modified logistic weight function (MLWF), is also proposed to systematically assign weights as a function of the phase difference between a reference point and a testing point. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between two time series. We show that some popular distance measures such as DTW and Euclidean distance are special cases of our proposed WDTW measure. We extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW. We have compared the performances of our proposed procedures with other popular approaches using public data sets available through the UCR Time Series Data Mining Archive for both time series classification and clustering problems. The experimental results indicate that the proposed approaches can achieve improved accuracy for time series classification and clustering problems.
User oriented trajectory search for trip recommendation
- In Proc. ACM EDBT
, 2012
"... ABSTRACT Trajectory sharing and searching have received significant attentions in recent years. In this paper, we propose and investigate a novel problem called User Oriented Trajectory Search (UOTS) for trip recommendation. In contrast to conventional trajectory search by locations (spatial domain ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Trajectory sharing and searching have received significant attentions in recent years. In this paper, we propose and investigate a novel problem called User Oriented Trajectory Search (UOTS) for trip recommendation. In contrast to conventional trajectory search by locations (spatial domain only), we consider both spatial and textual domains in the new UOTS query. Given a trajectory data set, the query input contains a set of intended places given by the traveler and a set of textual attributes describing the traveler's preference. If a trajectory is connecting/close to the specified query locations, and the textual attributes of the trajectory are similar to the traveler'e preference, it will be recommended to the traveler for reference. This type of queries can bring significant benefits to travelers in many popular applications such as trip planning and recommendation. There are two challenges in the UOTS problem, (i) how to constrain the searching range in two domains and (ii) how to schedule multiple query sources effectively. To overcome the challenges and answer the UOTS query efficiently, a novel collaborative searching approach is developed. Conceptually, the UOTS query processing is conducted in the spatial and textual domains alternately. A pair of upper and lower bounds are devised to constrain the searching range in two domains. In the meantime, a heuristic searching strategy based on priority ranking is adopted for scheduling the multiple query sources, which can further reduce the searching range and enhance the query efficiency notably. Furthermore, the devised collaborative searching approach can be extended to situations where the query locations are ordered. The performance of the proposed UOTS query is verified by extensive experiments based on real and synthetic trajectory data in road networks.
The Move-Split-Merge Metric for Time Series
"... A novel metric for time series, called MSM (move-split-merge), is proposed. This metric uses as building blocks three fundamental operations: Move, Split, and Merge, which can be applied in sequence to transform any time series into any other time series. A Move operation changes the value of a sing ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A novel metric for time series, called MSM (move-split-merge), is proposed. This metric uses as building blocks three fundamental operations: Move, Split, and Merge, which can be applied in sequence to transform any time series into any other time series. A Move operation changes the value of a single element, a Split operation converts a single element into two consecutive elements, and a Merge operation merges two consecutive elements into one. Each operation has an associated cost, and the MSM distance between two time series is defined to be the cost of the cheapest sequence of operations that transforms the first time series into the second one. An efficient, quadratic-time algorithm is provided for computing the MSM distance. MSM has the desirable properties of being metric, in contrast to the dynamic time warping (DTW) distance, and invariant to the choice of origin, in contrast to the Edit Distance with Real Penalty (ERP) metric. At the same time, experiments with public time series datasets demonstrate that MSM is a meaningful distance measure, that oftentimes leads to lower nearest neighbor classification error rate compared to DTW and ERP.