## Iterative Incremental Clustering of Time Series (2004)

### Cached

### Download Links

- [www.cs.gmu.edu]
- [www.cs.ucr.edu]
- [www.cs.ucr.edu]
- [www.cs.ucr.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IN EDBT |

Citations: | 29 - 8 self |

### BibTeX

@INPROCEEDINGS{Lin04iterativeincremental,

author = {Jessica Lin and Michail Vlachos and Eamonn Keogh and Dimitrios Gunopulos},

title = {Iterative Incremental Clustering of Time Series},

booktitle = {IN EDBT},

year = {2004},

pages = {106--122},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property of wavelets. The dilemma of choosing the initial centers is mitigated by initializing the centers at each approximation level, using the final centers returned by the coarser representations. In addition to casting the clustering algorithms as anytime algorithms, this approach has two other very desirable properties. By working at lower dimensionalities we can efficiently avoid local minima. Therefore, the quality of the clustering is usually better than the batch algorithm. In addition, even if the algorithm is run to completion, our approach is much faster than its batch counterpart. We explain, and empirically demonstrate these surprising and desirable properties with comprehensive experiments on several publicly available real data sets. We further demonstrate that our approach can be generalized to a framework of much broader range of algorithms or data mining problems.

### Citations

9033 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ge of EM over k-Means is its ability to model a much richer set of cluster shapes. This generality has made EM (and its many variants and extensions) the clustering algorithm of choice in data mining =-=[7]-=- and bioinformatics [17].118 J. Lin et al. 5.2 Experimental Results for I-EM Similar to the application of k-Means, we apply EM for different resolutions of data, and compare the clustering quality a... |

447 | Fast subsequence matching in time-series databases
- Faloutsos, Ranganathan, et al.
- 1994
(Show Context)
Citation Context ...t. Otherwise goto 3. In order to scale the various clustering methods to massive datasets, one can either reduce the number of objects, N, by sampling [2], or reduce the dimensionality of the objects =-=[1, 3, 9, 12, 13, 16, 19, 25, 26]-=-. For time-series, the objective is to find a representation at a lower dimensionality that preserves the original information and describes the original shape of the time-series data as closely as po... |

443 | Efficient Similarity Search in Sequence Databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ...ciated with the choice of initial centers, in addition to providing the functionality of user-interaction. The algorithm works by leveraging off the multi-resolution property of wavelet decomposition =-=[1, 6, 22]-=-. In particular, an initial clustering is performed with a very coarse representation of the data. The results obtained from this “quick and dirty” E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. ... |

259 | Scaling clustering algorithms to large databases
- Bradley, Fayyad, et al.
- 1998
(Show Context)
Citation Context ...ts. Anytime algorithms are valuable for large databases, since results are produced progressively and are refined over time [11]. Their utility for data mining has been documented at length elsewhere =-=[2, 21]-=-. While partitional clustering algorithms and wavelet decomposition have both been studied extensively in the past, the major novelty of our approach is that it mitigates the problem associated with t... |

252 | Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
- Keogh, Chakrabarti, et al.
(Show Context)
Citation Context ...t. Otherwise goto 3. In order to scale the various clustering methods to massive datasets, one can either reduce the number of objects, N, by sampling [2], or reduce the dimensionality of the objects =-=[1, 3, 9, 12, 13, 16, 19, 25, 26]-=-. For time-series, the objective is to find a representation at a lower dimensionality that preserves the original information and describes the original shape of the time-series data as closely as po... |

246 |
Some Methods for Classification and Analysis of Multivariate Observations
- McQueen
- 1967
(Show Context)
Citation Context ...en if the algorithm is run to completion, the time taken is typically much less than the time taken by the batch algorithm. We initially focus our approach on the popular k-Means clustering algorithm =-=[10, 18, 24]-=- for time series. For simplicity we demonstrate how the algorithm works by utilizing the Haar wavelet decomposition. Then we extend the idea to another widely used clustering algorithm, EM, and anothe... |

237 | On the need for time series data mining benchmarks: a survey and empirical demonstration
- Keogh, Kasetty
- 2002
(Show Context)
Citation Context ...itiveness of either method has been largely argued in the past, we apply DFT in the algorithm to demonstrate the generality of the framework. As a matter of fact, consistent with the results shown in =-=[15]-=-, the superiority of either method is highly data-dependent. In general, however, DFT performs better for smooth signals or sequences that resemble random walks. 5.4 Experimental Results for I-kMeans ... |

217 | Efficient time series matching by wavelets - Chan, Fu - 1999 |

157 |
Fast time sequence indexing for arbitrary Lp norms
- Yi, Faloutsos
- 2000
(Show Context)
Citation Context ...t. Otherwise goto 3. In order to scale the various clustering methods to massive datasets, one can either reduce the number of objects, N, by sampling [2], or reduce the dimensionality of the objects =-=[1, 3, 9, 12, 13, 16, 19, 25, 26]-=-. For time-series, the objective is to find a representation at a lower dimensionality that preserves the original information and describes the original shape of the time-series data as closely as po... |

140 | An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback
- Keogh, Pazzani
- 1998
(Show Context)
Citation Context ...ry background on these areas in this section. 2.1 Background on Clustering One of the most widely used clustering approaches is hierarchical clustering, due to the great visualization power it offers =-=[12]-=-. Hierarchical clustering produces a nested hierarchy of similar groups of objects, according to a pairwise distance matrix of the objects. One of the advantages of this method is its generality, sinc... |

108 | Efficiently supporting ad hoc queries in large datasets of time sequences
- Korn, Jagadish, et al.
- 1997
(Show Context)
Citation Context |

63 | Similarity search over time series data using wavelets
- Popivanov
- 2002
(Show Context)
Citation Context |

61 |
at Riverside Time Series Data Mining Archive (http://www.cs.ucr.edu/∼eamonn/TSDMA
- Keogh, Folias
- 2002
(Show Context)
Citation Context ...random starting points of each sequence from the original data pool. • Heterogeneous: This dataset is generated from a mixture of 10 real time series data from the UCR Time Series Data Mining Archive =-=[14]-=- (see Fig 4). Using the 10 time-series as seeds, we produced variation of the original patterns by adding small time shifting (2-3% of the series length), and interpolated Gaussian noise. Gaussian noi... |

60 | Adaptive dimension reduction for clustering high dimensional data
- Ding, He, et al.
- 2002
(Show Context)
Citation Context ...k-Means algorithm. We show that the objective functions obtained from the I-kMeans algorithm are better than those from the k-Means algorithm. The results are consistent with the work of Ding et. Al. =-=[5]-=-, in which the authors show that dimensionality reduction reduces theIterative Incremental Clustering of Time Series 115 chances of the algorithm being trapped in a local minimum. Furthermore, even w... |

59 | Efficient retrieval of similar time sequences using dft
- Rafiei, Mendelzon
- 1998
(Show Context)
Citation Context ...e decomposition method is due to its efficiency and simplicity. In this section we extend the I-kMeans to utilize another equally well-known decomposition method, the Discrete Fourier Transform (DFT) =-=[1, 20]-=-. Similar to the wavelet decomposition, DFT approximates the signal with a linear combination of basis functions. The vital difference between the two decomposition methods is that the wavelets are lo... |

52 | A comparison of DFT and DWT based Similarity Search in Time-Series Databases
- Wu, Agrawal, et al.
(Show Context)
Citation Context |

34 | Iterative Deepening Dynamic Time Warping for Time Series, SIAM KDD 2002. Eco 2002. http://interactive.usask.ca/skinteractive/modules/environment/ ecoregions
- Chu, Keogh, et al.
- 2002
(Show Context)
Citation Context ...ed in the literature, including the Discrete Fourier Transform (DFT) [1, 9], Singular Value Decomposition [16], Adaptive Piecewise Constant Approximation [13], Piecewise Aggregate Approximation (PAA) =-=[4, 26]-=-, Piecewise Linear Approximation [12] and the Discrete Wavelet Transform (DWT) [3, 19]. While all these approaches have shared the ability to produce a high quality reduced-dimensionality approximatio... |

24 | The Haar wavelet transform in the time series similarity paradigm
- STRUZIK, SIEBES
- 1999
(Show Context)
Citation Context ...or data mining problem. For future work, we plan to investigate the following: • Extending our algorithm to other data types. For example, image histograms can be successfully represented as wavelets =-=[6, 23]-=-. Our initial experiments on image histograms show great promise of applying the framework on image data. • For k-Means, examining the possibility of re-using the results (i.e. objective functions tha... |

22 | Anytime algorithm development tools
- Grass, Zilberstein
- 1996
(Show Context)
Citation Context ... a novel anytime version of partitional clustering algorithm based on wavelets. Anytime algorithms are valuable for large databases, since results are produced progressively and are refined over time =-=[11]-=-. Their utility for data mining has been documented at length elsewhere [2, 21]. While partitional clustering algorithms and wavelet decomposition have both been studied extensively in the past, the m... |

19 | A wavelet-based anytime algorithm for k-means clustering of time series
- Vlachos, Lin, et al.
- 2003
(Show Context)
Citation Context ...en if the algorithm is run to completion, the time taken is typically much less than the time taken by the batch algorithm. We initially focus our approach on the popular k-Means clustering algorithm =-=[10, 18, 24]-=- for time series. For simplicity we demonstrate how the algorithm works by utilizing the Haar wavelet decomposition. Then we extend the idea to another widely used clustering algorithm, EM, and anothe... |

13 |
Anytime exploratory data analysis for massive data sets
- Smyth, Wolpert
- 1997
(Show Context)
Citation Context ...ts. Anytime algorithms are valuable for large databases, since results are produced progressively and are refined over time [11]. Their utility for data mining has been documented at length elsewhere =-=[2, 21]-=-. While partitional clustering algorithms and wavelet decomposition have both been studied extensively in the past, the major novelty of our approach is that it mitigates the problem associated with t... |

9 |
Ten lectures on wavelets., number 61
- Daubechies
- 1992
(Show Context)
Citation Context ...ciated with the choice of initial centers, in addition to providing the functionality of user-interaction. The algorithm works by leveraging off the multi-resolution property of wavelet decomposition =-=[1, 6, 22]-=-. In particular, an initial clustering is performed with a very coarse representation of the data. The results obtained from this “quick and dirty” E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. ... |

5 |
Initialization of iterative refinement clustering algorithms
- unknown authors
- 1998
(Show Context)
Citation Context ...en if the algorithm is run to completion, the time taken is typically much less than the time taken by the batch algorithm. We initially focus our approach on the popular k-Means clustering algorithm =-=[10, 18, 24]-=- for time series. For simplicity we demonstrate how the algorithm works by utilizing the Haar wavelet decomposition. Then we extend the idea to another widely used clustering algorithm, EM, and anothe... |

5 |
An Expectation Maximization (EM) Algorithm for the Identification and
- Lawrence, Reilly
- 1990
(Show Context)
Citation Context ...s its ability to model a much richer set of cluster shapes. This generality has made EM (and its many variants and extensions) the clustering algorithm of choice in data mining [7] and bioinformatics =-=[17]-=-.118 J. Lin et al. 5.2 Experimental Results for I-EM Similar to the application of k-Means, we apply EM for different resolutions of data, and compare the clustering quality and running time with EM ... |

5 |
TSA-tree: a wavelet based approach to improve the efficiency of multi-level surprise and trend queries
- Shahabi, Tian, et al.
- 2000
(Show Context)
Citation Context ...ciated with the choice of initial centers, in addition to providing the functionality of user-interaction. The algorithm works by leveraging off the multi-resolution property of wavelet decomposition =-=[1, 6, 22]-=-. In particular, an initial clustering is performed with a very coarse representation of the data. The results obtained from this “quick and dirty” E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. ... |