Results 1  10
of
141
Exact Indexing of Dynamic Time Warping
, 2002
"... The problem of indexing time series has attracted much research interest in the database community. Most algorithms used to index time series utilize the Euclidean distance or some variation thereof. However is has been forcefully shown that the Euclidean distance is a very brittle distance me ..."
Abstract

Cited by 234 (30 self)
 Add to MetaCart
The problem of indexing time series has attracted much research interest in the database community. Most algorithms used to index time series utilize the Euclidean distance or some variation thereof. However is has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic Time Warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis.
Probabilistic discovery of time series motifs
, 2003
"... Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of thi ..."
Abstract

Cited by 119 (21 self)
 Add to MetaCart
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Trajectory Clustering: A PartitionandGroup Framework
 In SIGMOD
, 2007
"... Existing trajectory clustering algorithms group similar trajectories as a whole, thus discovering common trajectories. Our key observation is that clustering trajectories as a whole could miss common subtrajectories. Discovering common subtrajectories is very useful in many applications, especiall ..."
Abstract

Cited by 85 (11 self)
 Add to MetaCart
Existing trajectory clustering algorithms group similar trajectories as a whole, thus discovering common trajectories. Our key observation is that clustering trajectories as a whole could miss common subtrajectories. Discovering common subtrajectories is very useful in many applications, especially if we have regions of special interest for analysis. In this paper, we propose a new partitionandgroup framework for clustering trajectories, which partitions a trajectory into a set of line segments, and then, groups similar line segments together into a cluster. The primary advantage of this framework is to discover common subtrajectories from a trajectory database. Based on this partitionandgroup framework, we develop a trajectory clustering algorithm TRACLUS. Our algorithm consists of two phases: partitioning and grouping. For the first phase, we present a formal trajectory partitioning algorithm using the minimum description length (MDL) principle. For the second phase, we present a densitybased linesegment clustering algorithm. Experimental results demonstrate that TRACLUS correctly discovers common subtrajectories from real trajectory data.
Robust and fast similarity search for moving object trajectories
 In SIGMOD
, 2005
"... An important consideration in similaritybased retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, dist ..."
Abstract

Cited by 83 (12 self)
 Add to MetaCart
An important consideration in similaritybased retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance
Finding Motifs in Time Series
, 2002
"... The problem of efficiently locating previously known patterns in a time series database (i.e., query by content) has received much attention and may now largely be regarded as a solved problem. However, from a knowledge discovery viewpoint, a more interesting problem is the enumeration of previously ..."
Abstract

Cited by 72 (15 self)
 Add to MetaCart
The problem of efficiently locating previously known patterns in a time series database (i.e., query by content) has received much attention and may now largely be regarded as a solved problem. However, from a knowledge discovery viewpoint, a more interesting problem is the enumeration of previously unknown, frequently occurring patterns. We call such patterns "motifs," because of their close analogy to their discrete counterparts in computation biology. An efficient motif discovery algorithm for time series would be useful as a tool for summarizing and visualizing massive time series databases. In addition, it could be used as a subroutine in various other data mining tasks, including the discovery of association rules, clustering and classification. In this work we carefully motivate, then introduce, a nontrivial definition of time series motifs. We propose an efficient algorithm to discover them, and we demonstrate the utility and efficiency of our approach on several real world datasets.
Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures
"... The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introduci ..."
Abstract

Cited by 64 (19 self)
 Add to MetaCart
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments reimplementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic. 1.
Making Timeseries Classification More Accurate Using Learned Constraints
 In proc. of SDM Int’l Conf
, 2004
"... It has long been known that Dynamic Time Warping (DTW) is superior to Euclidean distance for classification and clustering of time series. However, until lately, most research has utilized Euclidean distance because it is more efficiently calculated. A recently introduced technique that greatly miti ..."
Abstract

Cited by 62 (20 self)
 Add to MetaCart
It has long been known that Dynamic Time Warping (DTW) is superior to Euclidean distance for classification and clustering of time series. However, until lately, most research has utilized Euclidean distance because it is more efficiently calculated. A recently introduced technique that greatly mitigates DTWs demanding CPU time has sparked a flurry of research activity. However, the technique and its many extensions still only allow DTW to be applied to moderately large datasets. In addition, almost all of the research on DTW has focused exclusively on speeding up its calculation; there has been little work done on improving its accuracy. In this work, we target the accuracy aspect of DTW performance and introduce a new framework that learns arbitrary constraints on the warping path of the DTW calculation. Apart from improving the accuracy of classification, our technique as a side effect speeds up DTW by a wide margin as well. We show the utility of our approach on datasets from diverse domains and demonstrate significant gains in accuracy and efficiency.
On discovering moving clusters in spatiotemporal data
 In SSTD
, 2005
"... Abstract. A moving cluster is defined by a set of objects that move close to each other for a long time interval. Reallife examples are a group of migrating animals, a convoy of cars moving in a city, etc. We study the discovery of moving clusters in a database of object trajectories. The differenc ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
Abstract. A moving cluster is defined by a set of objects that move close to each other for a long time interval. Reallife examples are a group of migrating animals, a convoy of cars moving in a city, etc. We study the discovery of moving clusters in a database of object trajectories. The difference of this problem compared to clustering trajectories and mining movement patterns is that the identity of a moving cluster remains unchanged while its location and content may change over time. For example, while a group of animals are migrating, some animals may leave the group or new animals may enter it. We provide a formal definition for moving clusters and describe three algorithms for their automatic discovery: (i) a straightforward method based on the definition, (ii) a more efficient method which avoids redundant checks and (iii) an approximate algorithm which trades accuracy for speed by borrowing ideas from the MPEG2 video encoding. The experimental results demonstrate the efficiency of our techniques and their applicability to large spatiotemporal datasets. 1
Experiencing SAX: A Novel Symbolic Representation of Time Series. Data Mining and Knowledge Discovery Journal
, 2007
"... Abstract Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would pote ..."
Abstract

Cited by 51 (13 self)
 Add to MetaCart
Abstract Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction,
Indexing SpatioTemporal Trajectories with Chebyshev Polynomials
 Proc. 2004 SIGMOD, toappear
"... In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate a multidimensional index into the reduced space of polynomial coefficients. There are many possible ways to choose the polynomial, including Fourier transforms, splines, nonlinear regressions, etc. Some of these possibilities have indeed been studied before. We hypothesize that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial. Minimax approximation is particularly meaningful for indexing because in a branchandbound search (i.e., for finding nearest neighbours), the smaller the maximum deviation, the more pruning opportunities there exist. In general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute. However, it has been shown that the Chebyshev approximation is almost identical to the optimal minimax polynomial, and is easy to compute [32]. Thus, we shall explore how to use