Results 1  10
of
39
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
 SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract

Cited by 220 (50 self)
 Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures
"... The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introduci ..."
Abstract

Cited by 64 (19 self)
 Add to MetaCart
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments reimplementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic. 1.
On the Marriage of L_pnorms and Edit Distance
 IN VLDB
, 2004
"... Existing studies on time series are based on two categories of distance functions. The first category consists of the Lpnorms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shift ..."
Abstract

Cited by 57 (2 self)
 Add to MetaCart
Existing studies on time series are based on two categories of distance functions. The first category consists of the Lpnorms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shifting but are nonmetric. The first
Experiencing SAX: A Novel Symbolic Representation of Time Series. Data Mining and Knowledge Discovery Journal
, 2007
"... Abstract Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would pote ..."
Abstract

Cited by 51 (13 self)
 Add to MetaCart
Abstract Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction,
Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series
, 2002
"... Machine learning research has, to a great extent, ignored an important aspect of many real world applications: time. Existing concept learners predominantly operate on a static set of attributes; for example, classifying flowers described by leaf size, petal colour and petal count. The values of the ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
Machine learning research has, to a great extent, ignored an important aspect of many real world applications: time. Existing concept learners predominantly operate on a static set of attributes; for example, classifying flowers described by leaf size, petal colour and petal count. The values of these attributes is assumed to be unchanging  the flower never grows or loses leaves.
Time Series Shapelets: A New Primitive for Data Mining
"... Classification of time series has been attracting great interest over the past decade. Recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems. While this may be considered good news, given the simplicity of ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
Classification of time series has been attracting great interest over the past decade. Recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems. While this may be considered good news, given the simplicity of implementing the nearest neighbor algorithm, there are some negative consequences of this. First, the nearest neighbor algorithm requires storing and searching the entire dataset, resulting in a time and space complexity that limits its applicability, especially on resourcelimited sensors. Second, beyond mere classification accuracy, we often wish to gain some insight into the data. In this work we introduce a new time series primitive, time series shapelets, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. As we shall show with extensive empirical evaluations in diverse domains, algorithms based on the time series shapelet primitives can be interpretable, more accurate and significantly faster than stateoftheart classifiers.
Time series feature extraction for data mining using DWT and DFT
, 2003
"... A new method of dimensionality reduction for time series data mining is proposed. Each time series is compressed with wavelet or Fourier decomposition. Instead of using only the first coefficients, a new method of choosing the best coefficients for a set of time series is presented. A criterion func ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
A new method of dimensionality reduction for time series data mining is proposed. Each time series is compressed with wavelet or Fourier decomposition. Instead of using only the first coefficients, a new method of choosing the best coefficients for a set of time series is presented. A criterion function is evaluated using all values of a coefficient position to determine a good set of coefficients. The optimal criterion function with respect to energy preservation is given. For many real life data sets much more energy can be preserved, which is advantageous for data mining tasks. All time series to be mined, or at least a representative subset, need to be available a priori.
A WaveletBased Anytime Algorithm for KMeans Clustering of Time Series
 In Proc. Workshop on Clustering High Dimensionality Data and Its Applications
, 2003
"... The emergence of the field of data mining in the last decade has sparked an increasing interest in clustering of tiate series. Although there has been much research on clustering in general, most classic machine learning and data mining algorithms do not work well for time series due to their unique ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
The emergence of the field of data mining in the last decade has sparked an increasing interest in clustering of tiate series. Although there has been much research on clustering in general, most classic machine learning and data mining algorithms do not work well for time series due to their unique structure. In particular, the high dimensionaliF, very high feature correlation, and the (typically) large amount of noise that characterize time series data present a difficult challenge. In this work we address these challenges by introducing a novel anytiate version of kMeans clustering algorithm for time series. The algorithm works by leveraging off the multiresolution property of wavelets. In particular, an initial clustering is perforated with a very coarse resolution representation of the data. The results obtained from this "quick and dirty" clustering are used to initialize a clustering at a slightly finer level of approximation. This process is repeated until the clustering results stabilize or until the "approxiatation" is the raw data. In addition to casting kMeans as an anytime algorithm, our approach has two other very unintuitive properties. The quality of the clustering is often better than the batch algorithm, and even if the algorithm is run to coatpletion, the time taken is typically much less than the time taken by the original algorithm. We explain, and eatpirically demonstrate these surprising and desirable properties with coatprehensive experiatents on several publicly available real data sets.
Decisiontree induction from timeseries data based on standardexample split test
 In Proceedings of the 20th International Conference on Machine Learning (ICML03
, 2003
"... This paper proposes a novel decision tree for a data set with timeseries attributes. Our timeseries tree has a value (i.e. a time sequence) of a timeseries attribute in its internal node, and splits examples based on dissimilarity between a pair of time sequences. Our method selects, for a split ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
This paper proposes a novel decision tree for a data set with timeseries attributes. Our timeseries tree has a value (i.e. a time sequence) of a timeseries attribute in its internal node, and splits examples based on dissimilarity between a pair of time sequences. Our method selects, for a split test, a time sequence which exists in data by exhaustive search based on class and shape information. Experimental results confirm that our induction method constructs comprehensive and accurate decision trees. Moreover, a medical application shows that our timeseries tree is promising for knowledge discovery.
Segment and combine approach for nonparametric timeseries classification
 in Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD
, 2005
"... Abstract. This paper presents a novel, generic, scalable, autonomous, and flexible supervised learning algorithm for the classification of multivariate and variable length time series. The essential ingredients of the algorithm are randomization, segmentation of timeseries, decision tree ensemble b ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Abstract. This paper presents a novel, generic, scalable, autonomous, and flexible supervised learning algorithm for the classification of multivariate and variable length time series. The essential ingredients of the algorithm are randomization, segmentation of timeseries, decision tree ensemble based learning of subseries classifiers, combination of subseries classification by voting, and crossvalidation based temporal resolution adaptation. Experiments are carried out with this method on 10 synthetic and realworld datasets. They highlight the good behavior of the algorithm on a large diversity of problems. Our results are also highly competitive with existing approaches from the literature. 1 Learning to classify timeseries Timeseries classification is an important problem from the viewpoint of its multitudinous applications. Specific applications concern the non intrusive monitoring and diagnosis of processes and biological systems, for example to decide whether the system is in a healthy operating condition on the basis of measurements