Results 1 - 10
of
109
Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
- In proceedings of ACM SIGMOD Conference on Management of Data
, 2002
"... Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data w ..."
Abstract
-
Cited by 185 (22 self)
- Add to MetaCart
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments' of varying lengths' such that their individual reconstruction errors' are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searchin& and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its' superiority.
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
- SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract
-
Cited by 169 (41 self)
- Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases
, 2000
"... The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing th ..."
Abstract
-
Cited by 115 (13 self)
- Add to MetaCart
The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed, Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or faster than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time.
Probabilistic discovery of time series motifs
, 2003
"... Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of thi ..."
Abstract
-
Cited by 92 (19 self)
- Add to MetaCart
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Landmarks: a new model for similarity-based pattern querying in time series databases
- In ICDE
, 2000
"... In this paper we present the Landmark Model, a model for time series that yields new techniques for similarity-based time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a g ..."
Abstract
-
Cited by 69 (5 self)
- Add to MetaCart
In this paper we present the Landmark Model, a model for time series that yields new techniques for similarity-based time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a general model of similarity that is consistent with human intuition and episodic memory. By tracking different specific subsets of features of landmarks, we can efficiently compute different Landmark Similarity measures that are invariant under corresponding subsets of six transformations; namely, Shifting, Uniform
Robust and fast similarity search for moving object trajectories
- In SIGMOD
, 2005
"... An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, dist ..."
Abstract
-
Cited by 61 (10 self)
- Add to MetaCart
An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance
Similarity search over time series data using wavelets
- In ICDE
, 2002
"... We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this applicatio ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this application is the Haar wavelet. In this work, we observe that a large class of wavelet transformations (not only orthonormal wavelets but also bi-orthonormal wavelets)can be used to support similarity search. This class includes the most popular and most effective wavelets being used in image compression. We present a detailed performance study of the effects of using different wavelets on the performance of similarity search for time-series data. We include several wavelets that outperform both the Haar wavelet and the best known non-wavelet transformations for this application. To ensure our results are usable by an application engineer, we also show how to configure an indexing strategy for the best performing transformations. Finally, we identify classes of data that can be indexed efficiently using these wavelet transformations. 1.
Efficient Retrieval of Similar Time Sequences Using DFT
- In Proc. FODO Conference, Kobe
, 1998
"... We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data. Keywords similarity retrieval, time series indexing 1 Introduction Time sequences constitute a large amount of data stored in computers. Examples include stock prices, exchange rates, weather data and biomedical measurements. We are often interested in similarity queries on time-series data [APWZ95, ALSS95]. For example, we may want to find stocks that behave in approximately the same way; or years when the temperature pat...
Variable Length Queries for Time Series Data
- IN ICDE
, 2000
"... Finding similar patterns in a time sequence is a well-known problem that has been addressed by many authors. Most of the current techniques work well for queries of a prespecified length, but fail for variable length queries. We propose a new indexing technique that works well for variable length ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Finding similar patterns in a time sequence is a well-known problem that has been addressed by many authors. Most of the current techniques work well for queries of a prespecified length, but fail for variable length queries. We propose a new indexing technique that works well for variable length queries. Our idea is to store index structures at different resolutions for a given dataset. The resolutions are based on wavelets. A number of subqueries at different resolutions are generated for each variable length query. The ranges of the subqueries are progressively refined based on results from previous subqueries. Our experiments show that the total cost for our method is 4 to 20 times less than the current techniques including Linear Scan. Because of the need to store information at multiple resolution levels, the storage requirement of our method could potentially be large. In the second part of the paper, we show how the index information can be compressed with minimal information loss. According to our experimental results, even after compressing the size of the index to one fifth, the total cost of our method is 3 to 15 times less than the current techniques.
Similarity Search for Multidimensional Data Sequences
- In ICDE
, 2000
"... Time-series data, which are a series of one-dimensional real numbers, have been studied in various database applications. In this paper, we extend the traditional similarity search methods on time-series data to support a multidimensional data sequence, such as a video stream. We investigate the pro ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Time-series data, which are a series of one-dimensional real numbers, have been studied in various database applications. In this paper, we extend the traditional similarity search methods on time-series data to support a multidimensional data sequence, such as a video stream. We investigate the problem of retrieving similar multidimensional data sequences from a large database. To prune irrelevant sequences in a database, we introduce correct and efficient similarity functions. Both data sequences and query sequences are partitioned into subsequences, and each of them is represented by a Minimum Bounding Rectangle (MBR). The query processing is based upon these MBRs, instead of scanning data elements of entire sequences. Our method is designed (1) to select candidate sequences in a database, and (2) to find the subsequences of a selected sequence, each of which falls under the given threshold. The latter is of special importance in the case of retrieving subsequences from large and c...

