Results 1 
5 of
5
Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
 In proceedings of ACM SIGMOD Conference on Management of Data
, 2002
"... Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data w ..."
Abstract

Cited by 232 (28 self)
 Add to MetaCart
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments' of varying lengths' such that their individual reconstruction errors' are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a nonlower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searchin& and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its' superiority.
QUANTIZING TIME SERIES FOR EFFICIENT SIMILARITY SEARCH UNDER TIME WARPING
"... Indexing Time Series Data is an interesting problem that has attracted much interest in the research community for the last decade. Traditional indexing methods organize the data space using different metrics. For time series, however, there are some cases when a metric is not suited for properly as ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Indexing Time Series Data is an interesting problem that has attracted much interest in the research community for the last decade. Traditional indexing methods organize the data space using different metrics. For time series, however, there are some cases when a metric is not suited for properly assessing the similarity between sequences. For instance, to detect similarities between sequences that are locally out of phase Dynamic Time Warping (DTW) must be used. DTW is not a metric as it does not satisfy the triangular inequality. Therefore, traditional spatial access methods cannot be used without introducing false dismissals. In such cases, alternative methods for organizing and searching time series data must be proposed. In this paper we propose the use of quantization to generate small and homogeneous representations of time series. We compute upper and lowerbounds on the DTW distance to a query sequence using this quantized representation to filterout sequences that cannot be a best match for the query. In the proposed approach, efficient search is achieved by organizing the quantized representation of data in a linear array that can be efficiently read from disk. The computational cost of processing the query is shadowed by the IO cost required to scan the file containing the linear array and it does affect the total query cost.
DDR: An Index Method for Large Time Series Datasets
"... this paper, we propose a 2phases (filtering and refinement) method for searching a time series dataset. In the filtering step, a quantizing time series is used to construct a compact file which is scanned for filtering out nonrelative data. A small set of candidates is translated to second step to ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
this paper, we propose a 2phases (filtering and refinement) method for searching a time series dataset. In the filtering step, a quantizing time series is used to construct a compact file which is scanned for filtering out nonrelative data. A small set of candidates is translated to second step to e refined. In this step, we introduce an e#ective compressing method named DDR (grid ased Datawise Dimensionality Reduction) which attempts to preserve the characteristics of the time series. An ex erimental comparison with expfikk( techniques demonstrates the utility of our approach
QUANTIZING TIME SERIES FOR EFFICIENT SUBSEQUENCE MATCHING
"... Indexing time series data is an interesting problem that has attracted much interest in the research community for the last decade. Traditional indexing methods organize the data space using different metrics. However, searching highdimensional spaces using a hierarchical index is not always effici ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Indexing time series data is an interesting problem that has attracted much interest in the research community for the last decade. Traditional indexing methods organize the data space using different metrics. However, searching highdimensional spaces using a hierarchical index is not always efficient because a large portion of the index might need to be accessed during search. We have revisited this problem of matching subsequences in light of new technological advances. In particular, we have paid close attention to the increasing ratio of CPU to disk performance. We recognize this problem is heavily bound by IO operations and address this issue in a twofold manner. First, we propose the use of quantization to generate small and homogeneous representations of time series. Quantization provides tight upper and lowerbounds on the measure of similarity to a query sequence. This allows us to drastically reduce the number of false alarms during search. Second, we organize the quantized representation of data in a linear array that can be efficiently read from disk. By reducing the number of false alarms and by sequentially reading the index, we are able to significantly reduce the IO cost of query processing. In consequence, we improve the overall search performance by up to a factor of 3 with respect to state of the art techniques for subsequence matching.
A Quantization Approach for Efficient Similarity Search on Time Series Data
"... In recent years, we have observed a growing interest in similarity search on large collections of time series data. The research community has provided ingenious approaches for solving this problem. Most of the proposals advocate transforming a time series data to a smaller object that can be indexe ..."
Abstract
 Add to MetaCart
In recent years, we have observed a growing interest in similarity search on large collections of time series data. The research community has provided ingenious approaches for solving this problem. Most of the proposals advocate transforming a time series data to a smaller object that can be indexed by a spatial access method. Unfortunately, these techniques are not always effective and, in some cases, they can be outperformed by a simple linear scan on the data set. The major problems affecting the performance of these techniques are accessing a large portion of the index structure, retrieving a large number of data objects to guarantee the correctness of the result, or the combination of both. A successful mechanism for efficient similarity search must therefore minimize both index and data accesses during search. In this paper, we propose a new encoding strategy for time series data. Our proposed approach, Self COntained Bit Encoding (SCoBE), produces a compact representation of time series data. SCoBE allows us to take advantage of fast sequential disk accesses and to define tight upper and lowerbounds on the distance between a query and time series data objects. We provide experimental evidence that our approach consistently outperforms previous methods on a variety of data sets. 1