Results 1  10
of
14
Probabilistic discovery of time series motifs
, 2003
"... Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of thi ..."
Abstract

Cited by 179 (24 self)
 Add to MetaCart
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Indexing SpatioTemporal Trajectories with Chebyshev Polynomials
 Proc. 2004 SIGMOD, toappear
"... In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate ..."
Abstract

Cited by 79 (0 self)
 Add to MetaCart
In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate a multidimensional index into the reduced space of polynomial coefficients. There are many possible ways to choose the polynomial, including Fourier transforms, splines, nonlinear regressions, etc. Some of these possibilities have indeed been studied before. We hypothesize that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial. Minimax approximation is particularly meaningful for indexing because in a branchandbound search (i.e., for finding nearest neighbours), the smaller the maximum deviation, the more pruning opportunities there exist. In general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute. However, it has been shown that the Chebyshev approximation is almost identical to the optimal minimax polynomial, and is easy to compute [32]. Thus, we shall explore how to use
A ComplexityInvariant Distance Measure for Time Series Gustavo E.A.P.A. Batista 1,2
"... The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple neares ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping. In this work we make a surprising claim. There is an invariance that the community has missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where complex objects are incorrectly assigned to a simpler class. We introduce the first complexityinvariant distance measure for time series, and show that it generally produces significant improvements in classification accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series classification experiments ever attempted, and show that complexityinvariant distance measures can produce improvements in accuracy in the vast majority of cases.
Detecting Time Series Motifs Under Uniform Scaling ABSTRACT
"... Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rulediscovery, noveltydetection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algo ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rulediscovery, noveltydetection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algorithms, motif discovery has been successfully applied to many domains, including medicine, motion capture, robotics and meteorology. In this work we show that most previous applications of time series motifs have been severely limited by the definition’s brittleness to even slight changes of uniform scaling, the speed at which the patterns develop. We introduce a new algorithm that allows discovery of time series motifs with invariance to uniform scaling, and show that it produces objectively superior results in several important domains. Apart from being more general than all other motif discovery algorithms, a further contribution of our work is that it is simpler than previous approaches, in particular we have drastically reduced the number of parameters that need to be specified.
BTW: a web server for Boltzmann time warping of gene
, 2006
"... expression time series ..."
(Show Context)
Visual Query Language: Finding patterns in and relationships among time series data
, 2004
"... Many scientific datasets archive a large number of variables over time. These timeseries data streams typically track many variables over relatively long periods of time, and therefore are often both wide and deep. In this paper, we describe the Visual Query Language (VQL) [3], a technology for loc ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Many scientific datasets archive a large number of variables over time. These timeseries data streams typically track many variables over relatively long periods of time, and therefore are often both wide and deep. In this paper, we describe the Visual Query Language (VQL) [3], a technology for locating time series patterns in historical or real time data. The user interactively specifies a search pattern, VQL finds similar shapes, and returns a ranked list of matches. VQL supports both univariate and multivariate queries, and allows the user to interactively specify the the quality of the match, including temporal warping, amplitude warping, and temporal constraints between features.
Clustered alignments of geneexpression time series data
"... Motivation: Characterizing and comparing temporal geneexpression responses is an important computational task for answering a variety of questions in biological studies. Algorithms for aligning time series represent a valuable approach for such analyses. However, previous approaches to aligning gen ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Motivation: Characterizing and comparing temporal geneexpression responses is an important computational task for answering a variety of questions in biological studies. Algorithms for aligning time series represent a valuable approach for such analyses. However, previous approaches to aligning geneexpression time series have assumed that all genes should share the same alignment. Our work is motivated by the need for methods that identify sets of genes that differ in similar ways between two time series, even when their expression profiles are quite different. Results: We present a novel algorithm that calculates clustered alignments; the method finds clusters of genes such that the genes within a cluster share a common alignment, but each cluster is aligned independently of the others. We also present an efficient new segmentbased alignment algorithm for time series called SCOW (shorting correlationoptimized warping). We evaluate our methods by assessing the accuracy of alignments computed with sparse time series from a toxicogenomics dataset. The results of our evaluation indicate that our clustered alignment approach and SCOW provide more accurate alignments than previous approaches. Additionally, we apply our clustered alignment approach to characterize the effects of a conditional Mop3 knockout in mouse liver. Availability: Source code is available at
Clustered
"... Vol. 25 ISMB 2009, pages i119–i127 doi:10.1093/bioinformatics/btp206 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Vol. 25 ISMB 2009, pages i119–i127 doi:10.1093/bioinformatics/btp206
Brigham and Women's Hospital
"... Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of intere ..."
Abstract
 Add to MetaCart
Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of interest. Since the formalism of time series motifs in 2002, dozens of researchers have used them for diverse applications in many different domains. Because the obvious algorithm for computing motifs is quadratic in the number of items, more than a dozen approximate algorithms to discover motifs have been proposed in the literature. In this work, for the first time, we show a tractable exact algorithm to find time series motifs. As we shall show through extensive experiments, our algorithm is up to three orders of magnitude faster than bruteforce search in large datasets. We further show that our algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, nearduplicate detection and summarization, and we consider detailed case studies in domains as diverse as electroencephalograph interpretation and entomological telemetry data mining.
Less is More: Similarity of Time Series under Linear Transformations
"... When comparing time series, znormalization preprocessing and dynamic time warping (DTW) distance became almost standard procedure. This paper makes a point against carelessly using this setup by discussing implications and alternatives. A (conceptually) simpler distance measure is proposed that all ..."
Abstract
 Add to MetaCart
(Show Context)
When comparing time series, znormalization preprocessing and dynamic time warping (DTW) distance became almost standard procedure. This paper makes a point against carelessly using this setup by discussing implications and alternatives. A (conceptually) simpler distance measure is proposed that allows for a linear transformation of amplitude and time only, but is also open for other normalizations (unachievable by znormalization preprocessing). Lower bounding techniques are presented for this measure that apply directly to raw series. 1