Results 1 
8 of
8
Probabilistic discovery of time series motifs
, 2003
"... Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of thi ..."
Abstract

Cited by 119 (21 self)
 Add to MetaCart
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Indexing SpatioTemporal Trajectories with Chebyshev Polynomials
 Proc. 2004 SIGMOD, toappear
"... In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate a multidimensional index into the reduced space of polynomial coefficients. There are many possible ways to choose the polynomial, including Fourier transforms, splines, nonlinear regressions, etc. Some of these possibilities have indeed been studied before. We hypothesize that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial. Minimax approximation is particularly meaningful for indexing because in a branchandbound search (i.e., for finding nearest neighbours), the smaller the maximum deviation, the more pruning opportunities there exist. In general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute. However, it has been shown that the Chebyshev approximation is almost identical to the optimal minimax polynomial, and is easy to compute [32]. Thus, we shall explore how to use
Detecting Time Series Motifs Under Uniform Scaling ABSTRACT
"... Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rulediscovery, noveltydetection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algo ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rulediscovery, noveltydetection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algorithms, motif discovery has been successfully applied to many domains, including medicine, motion capture, robotics and meteorology. In this work we show that most previous applications of time series motifs have been severely limited by the definition’s brittleness to even slight changes of uniform scaling, the speed at which the patterns develop. We introduce a new algorithm that allows discovery of time series motifs with invariance to uniform scaling, and show that it produces objectively superior results in several important domains. Apart from being more general than all other motif discovery algorithms, a further contribution of our work is that it is simpler than previous approaches, in particular we have drastically reduced the number of parameters that need to be specified.
Visual Query Language: Finding patterns in and relationships among time series data
, 2004
"... Many scientific datasets archive a large number of variables over time. These timeseries data streams typically track many variables over relatively long periods of time, and therefore are often both wide and deep. In this paper, we describe the Visual Query Language (VQL) [3], a technology for loc ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Many scientific datasets archive a large number of variables over time. These timeseries data streams typically track many variables over relatively long periods of time, and therefore are often both wide and deep. In this paper, we describe the Visual Query Language (VQL) [3], a technology for locating time series patterns in historical or real time data. The user interactively specifies a search pattern, VQL finds similar shapes, and returns a ranked list of matches. VQL supports both univariate and multivariate queries, and allows the user to interactively specify the the quality of the match, including temporal warping, amplitude warping, and temporal constraints between features.
A ComplexityInvariant Distance Measure for Time Series Gustavo E.A.P.A. Batista 1,2
"... The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple neares ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping. In this work we make a surprising claim. There is an invariance that the community has missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where complex objects are incorrectly assigned to a simpler class. We introduce the first complexityinvariant distance measure for time series, and show that it generally produces significant improvements in classification accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series classification experiments ever attempted, and show that complexityinvariant distance measures can produce improvements in accuracy in the vast majority of cases.
Brigham and Women's Hospital
"... Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of intere ..."
Abstract
 Add to MetaCart
Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of interest. Since the formalism of time series motifs in 2002, dozens of researchers have used them for diverse applications in many different domains. Because the obvious algorithm for computing motifs is quadratic in the number of items, more than a dozen approximate algorithms to discover motifs have been proposed in the literature. In this work, for the first time, we show a tractable exact algorithm to find time series motifs. As we shall show through extensive experiments, our algorithm is up to three orders of magnitude faster than bruteforce search in large datasets. We further show that our algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, nearduplicate detection and summarization, and we consider detailed case studies in domains as diverse as electroencephalograph interpretation and entomological telemetry data mining.
Improving the Supervised Learning of Activity Classifiers for Human Motion Data
, 2011
"... The ability to accurately recognize human activities from motion data is an important stepping stone toward creating many types of intelligent user interfaces. Many supervised learning methods have been demonstrated for learning activity classifiers from data; however, these classifiers often fail d ..."
Abstract
 Add to MetaCart
The ability to accurately recognize human activities from motion data is an important stepping stone toward creating many types of intelligent user interfaces. Many supervised learning methods have been demonstrated for learning activity classifiers from data; however, these classifiers often fail due to noisy sensor data, lack of labeled training samples for rare actions and large individual differences in activity execution. In this chapter, we introduce two techniques for improving supervised learning of human activities from motion data: 1) an active learning framework to reduce the number of samples required to segment motion traces and 2) an intelligent feature selection technique that both improves classification performance and reduces training time. We demonstrate how these techniques can be used to improve the classification of human household activities, an area of particular research interest since it facilitates the development of eldercare assistance systems to monitor household occupants. 1