Results 1 
9 of
9
Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
 In proceedings of ACM SIGMOD Conference on Management of Data
, 2002
"... Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced d ..."
Abstract

Cited by 312 (32 self)
 Add to MetaCart
(Show Context)
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments' of varying lengths' such that their individual reconstruction errors' are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a nonlower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searchin& and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its' superiority.
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
 SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract

Cited by 312 (57 self)
 Add to MetaCart
(Show Context)
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases
, 2000
"... The problem of similarity search in large time series databases has attracted much attention recently. It is a nontrivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing th ..."
Abstract

Cited by 235 (21 self)
 Add to MetaCart
(Show Context)
The problem of similarity search in large time series databases has attracted much attention recently. It is a nontrivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed, Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or faster than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time.
A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases
 IN 4TH PACIFICASIA CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD
, 2000
"... We address the problem of similarity search in large time series databases. We introduce ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
We address the problem of similarity search in large time series databases. We introduce
An indexing scheme for fast similarity search in large time series databases
 In proceedings of the 11 th International Conference on Scientific and Statistical Database Management
, 1999
"... We address the problem of similarity search in large time series databases. We introduce a novel indexing algorithm that allows faster retrieval. The index is formed by creating bins that contain time series subsequences of approximately the same shape. For each bin, we can quickly calculate a lower ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We address the problem of similarity search in large time series databases. We introduce a novel indexing algorithm that allows faster retrieval. The index is formed by creating bins that contain time series subsequences of approximately the same shape. For each bin, we can quickly calculate a lowerbound on the distance between a given query and the most similar element of the bin. This bound allows us to search the bins in best first order, and to prune some bins from the search space without having to examine the contents. Additional speedup is obtained by optimizing the data within the bins such that we can avoid having to compare the query to every item in the bin. We call our approach STBindexing and experimentally validate it on space telemetry, medical and synthetic data, demonstrating approximately an order of magnitude speedup.
Time Series Data Analysis and Preprocess on
, 2002
"... In this paper we introduce a novel classification algorithm called MCC (Minimal Cover Classification), which works well for numerical data and categorical data. Given a new data tuple, it provides values for each class that measures the likelihood of the tuple belonging to that class. We then apply ..."
Abstract
 Add to MetaCart
In this paper we introduce a novel classification algorithm called MCC (Minimal Cover Classification), which works well for numerical data and categorical data. Given a new data tuple, it provides values for each class that measures the likelihood of the tuple belonging to that class. We then apply the MCC algorithm on real stock market data to predict the `upward' or `downward' trend of kdays stock returns. To improve the prediction accuracy we use the discrete Fourier transform and its inverse transform to filter noise whilst preserving the trend of global movement of time series in the time domain. The experimental result shows that the MCC algorithm is comparable to C4.5. Using MCC as a mining algorithm to predict the `upward' or `downward' trend of kday stock returns, the average hit rate on preprocessed data is 20.55% higher than that on the original data. This means that the prediction accuracy has been remarkably improved by means of the proposed MCC algorithm on noise filtered time series.
Data Reduction and Noise Filtering for Predicting
, 2002
"... In this paper we introduce a modification of the real discrete Fourier transform and its inverse transform to filter noise and perform reduction on the data whilst preserving the trend of global moving of time series. The transformed data is still in the same time domain as the original data, and ..."
Abstract
 Add to MetaCart
In this paper we introduce a modification of the real discrete Fourier transform and its inverse transform to filter noise and perform reduction on the data whilst preserving the trend of global moving of time series. The transformed data is still in the same time domain as the original data, and can therefore be directly used by any other mining algorithms.
EnsembleIndex: A New Approach to Indexing Large Databases
 In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery
, 2001
"... The problem of similarity search (querybycontent) has attracted much research interest. It is a difficult problem because of the inherently high dimensionality of the data. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a m ..."
Abstract
 Add to MetaCart
The problem of similarity search (querybycontent) has attracted much research interest. It is a difficult problem because of the inherently high dimensionality of the data. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier Transform (DFT), the Discrete Wavelet Transform (DWT) and Piecewise Polynomial Approximation. In this work, we introduce a novel framework for using ensembles of two or more representations for more efficient indexing. The basic idea is that instead of committing to a single representation for an entire dataset, different representations are chosen for indexing different parts of the database. The representations are chosen based upon a local view of the database. For example, sections of the data that can achieve a high fidelity representation with wavelets are indexed as wavelets, but highly spectral sections of the data are indexed using the Fourier transform. At query time, it is necessary to search several small heterogeneous indices, rather than one large homogeneous index. As we will theoretically and empirically demonstrate this results in much faster query response times.
Financial Time Series Indexing Based on Low Resolution Clustering Takchung Fu 1,2, ‡ , Fulai Chung 1
"... One of the major tasks in time series database application is time series query. Time series data is always exist in large data size and high dimensionality. However, different from traditional data, it is impossible to index the time series in traditional database system. Moreover, time series with ..."
Abstract
 Add to MetaCart
(Show Context)
One of the major tasks in time series database application is time series query. Time series data is always exist in large data size and high dimensionality. However, different from traditional data, it is impossible to index the time series in traditional database system. Moreover, time series with different lengths always coexists in the same database. Therefore, development of a time series indexing approach is of fundamental importance for maintaining an acceptable speed for time series query. By identifying the perceptually important points (PIPs) from the time domain, time series of different lengths can be compared and the dimensionality of the time series can be greatly reduced. In this paper, a time series indexing approach, based on clustering the time series data in low resolution, is proposed. This approach is customized for stock time series to cater for its unique behaviors. It follows the time domain approach to carry out the indexing process which is intuitive to ordinary data analysts. One may find it particularly attractive in applications like stock data analysis. The proposed approach is efficient and effective as well. As demonstrated by the experiments, the proposed approach speeds up the time series query process while it also guarantees no false dismissals. In addition, the proposed approach can handle the problem of updating new entries to the database without any difficulty. 1.