Results 1 
3 of
3
A ComplexityInvariant Distance Measure for Time Series Gustavo E.A.P.A. Batista 1,2
"... The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple neares ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping. In this work we make a surprising claim. There is an invariance that the community has missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where complex objects are incorrectly assigned to a simpler class. We introduce the first complexityinvariant distance measure for time series, and show that it generally produces significant improvements in classification accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series classification experiments ever attempted, and show that complexityinvariant distance measures can produce improvements in accuracy in the vast majority of cases.
Time Series Classification under More Realistic Assumptions
"... Most literature on time series classification assumes that the beginning and ending points of the pattern of interest can be correctly identified, both during the training phase and later deployment. In this work, we argue that this assumption is unjustified, and this has in many cases led to unwarr ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Most literature on time series classification assumes that the beginning and ending points of the pattern of interest can be correctly identified, both during the training phase and later deployment. In this work, we argue that this assumption is unjustified, and this has in many cases led to unwarranted optimism about the performance of the proposed algorithms. As we shall show, the task of correctly extracting individual gait cycles, heartbeats, gestures, behaviors, etc., is generally much more difficult than the task of actually classifying those patterns. We propose to mitigate this problem by introducing an alignmentfree time series classification framework. The framework requires only very weakly annotated data, such as “in this ten minutes of data, we see mostly normal heartbeats..., ” and by generalizing the classic machine learning idea of data editing to streaming/continuous data, allows us to build robust, fast and accurate classifiers. We demonstrate on several diverse realworld problems that beyond removing unwarranted assumptions and requiring essentially no human intervention, our framework is both significantly faster and significantly more accurate than current stateoftheart approaches. 1.
A Novel Approximation to Dynamic Time Warping allows Anytime Clustering of Massive Time Series Datasets
"... Given the ubiquity of time series data, the data mining community has spent significant time investigating the best time series similarity measure to use for various tasks and domains. After more than a decade of extensive efforts, there is increasing evidence that Dynamic Time Warping (DTW) is very ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Given the ubiquity of time series data, the data mining community has spent significant time investigating the best time series similarity measure to use for various tasks and domains. After more than a decade of extensive efforts, there is increasing evidence that Dynamic Time Warping (DTW) is very difficult to beat. Given that, recent efforts have focused on making the intrinsically slow DTW algorithm faster. For the similaritysearch task, an important subroutine in many data mining algorithms, significant progress has been made by replacing the vast majority of expensive DTW calculations with cheaptocompute lower bound calculations. However, these lower bound based optimizations do not directly apply to clustering, and thus for some realistic problems, clustering with DTW can take days or weeks. In this work, we show that we can mitigate this untenable lethargy by casting DTW clustering as an anytime algorithm. At the heart of our algorithm is a novel dataadaptive approximation to DTW which can be quickly computed, and which produces approximations to DTW that are much better than the best currently known lineartime approximations. We demonstrate our ideas on real world problems showing that we can get virtually all the accuracy of a batch DTW clustering algorithm in a fraction of the time.