Results 1 
6 of
6
Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
 In proceedings of ACM SIGMOD Conference on Management of Data
, 2002
"... Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data w ..."
Abstract

Cited by 235 (28 self)
 Add to MetaCart
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments' of varying lengths' such that their individual reconstruction errors' are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a nonlower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searchin& and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its' superiority.
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
 SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract

Cited by 220 (50 self)
 Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Timeseries similarity queries employing a featurebased approach
 In 7 th Hellenic Conference on Informatics, Ioannina
, 1999
"... Timeseries, or timesequence, data show the value of a parameter over time. A common query with timeseries data is to find all sequences which are similar to a given sequence. The most common technique for evaluating similarity between two sequences involves calculating the Euclidean distance betw ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Timeseries, or timesequence, data show the value of a parameter over time. A common query with timeseries data is to find all sequences which are similar to a given sequence. The most common technique for evaluating similarity between two sequences involves calculating the Euclidean distance between them. However, many examples can be given where two similar sequences are separated by a large Euclidean distance. In this paper, instead of calculating the Euclidean distance directly between two sequences, the sequences are transformed into a feature vector and the Euclidean distance between the feature vectors is then calculated. Results show that this approach is superior for finding similar sequences. 2.
Indexing of Compressed Time Series
 Data Mining in Time Series Databases
"... We describe a procedure for identifying major minima and maxima of a time series, and present two applications of this procedure. The first application is fast compression of a series, by selecting major extrema and discarding the other points. The compression algorithm runs in linear time and takes ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We describe a procedure for identifying major minima and maxima of a time series, and present two applications of this procedure. The first application is fast compression of a series, by selecting major extrema and discarding the other points. The compression algorithm runs in linear time and takes constant memory. The second application is indexing of compressed series by their major extrema, and retrieval of series similar to a given pattern. The retrieval procedure searches for the series whose compressed representation is similar to the compressed pattern. It allows the user to control the tradeoff between the speed and accuracy of retrieval. We show the effectiveness of the compression and retrieval for stock charts, meteorological data, and electroencephalograms. Keywords. Time series, compression, fast retrieval, similarity measures. 1
Compression of Time Series by Extracting Major Extrema
"... We formalize the notion of important extrema of a time series, that is, its major minima and maxima; analyze basic mathematical properties of important extrema; and apply these results to the problem of timeseries compression. First, we define numeric importance levels of extrema in a series, and p ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We formalize the notion of important extrema of a time series, that is, its major minima and maxima; analyze basic mathematical properties of important extrema; and apply these results to the problem of timeseries compression. First, we define numeric importance levels of extrema in a series, and present algorithms for identifying major extrema and computing their importances. Then, we give a procedure for fast lossy compression of a time series at a given rate, by extracting its most important minima and maxima, and discarding the other points.
Dimensionality Reduction for Indexing Time Series Based on the Minimum Distance *
"... We address the problem of efficient similarity search based on the minimum distance in large time series databases. To support minimum distance queries, most of previous work has to take the preprocessing step of vertical shifting. However, the vertical shifting has an additional overhead in buildin ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We address the problem of efficient similarity search based on the minimum distance in large time series databases. To support minimum distance queries, most of previous work has to take the preprocessing step of vertical shifting. However, the vertical shifting has an additional overhead in building index. In this paper, we propose a novel dimensionality reduction technique for indexing time series based on the minimum distance. We call our approach the SSVindexing (Segmented Sum of Variation Indexing). The proposed method can match time series of similar shape without vertical shifting and guarantees no false dismissals. Several experiments are performed on real data (stock price movement) to measure the performance of the SSVindexing.