Results 1 - 10
of
101
Efficient Time Series Matching by Wavelets
- In ICDE
, 1999
"... Time series stored as feature vectors can be indexed by multidimensional index trees like R-Trees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete ..."
Abstract
-
Cited by 170 (1 self)
- Add to MetaCart
Time series stored as feature vectors can be indexed by multidimensional index trees like R-Trees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), Karhunen-Loeve (KL) transform or Singular Value Decomposition (SVD) can be applied. While the use of DFT and K-L transform or SVD have been studied in the literature, to our knowledge, there is no in-depth study on the application of DWT. In this paper, we propose to use Haar Wavelet Transform for time series indexing. The major contributions are: (1) we show that Euclidean distance is preserved in the Haar transformed domain and no false dismissal will occur, (2) we show that Haar transform can outperform DFT through experiments, (3) a new similarity model is suggested to accommodate vertical shift of time series, and (4) a two-pha...
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
- SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract
-
Cited by 169 (41 self)
- Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Efficient Retrieval of Similar Time Sequences Under Time Warping
, 1997
"... Fast similarity searching in large time-sequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first ..."
Abstract
-
Cited by 156 (3 self)
- Add to MetaCart
Fast similarity searching in large time-sequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first few DFT coefficients) and subsequent use of fast spatial access methods for the points in feature space. In this work we examine a popular, field-tested dissimilarity function, the "time warping" distance function which permits local accelerations and decelerations in the rate of the signals or sequences. This function is natural and suitable for several applications, like matching of voice, audio and medical signals (e.g., electrocardiograms) However, from the indexing viewpoint it presents two major challenges: (a) it does not lead to any natural "features", precluding the use of spatial access methods (b) it is quadratic (O(len 1 len 2 )) on the length of the sequences involved. Here we ...
StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time
- In VLDB
, 2002
"... Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market ..."
Abstract
-
Cited by 133 (8 self)
- Add to MetaCart
Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market trader might use such a tool to spot arbitrage opportunities.
Rule Discovery From Time Series
, 1998
"... We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such as "a period of low telephone call activity is usually followed by a sharp rise in call volume". ..."
Abstract
-
Cited by 120 (0 self)
- Add to MetaCart
We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such as "a period of low telephone call activity is usually followed by a sharp rise in call volume". Examples of rules relating two or more time series are "if the Microsoft stock price goes up and Intel falls, then IBM goes up the next day," and "if Microsoft goes up strongly for one day, then declines strongly on the next day, and on the same days Intel stays about level, then IBM stays about level." Our emphasis is in the discovery of local patterns in multivariate time series, in contrast to traditional time series analysis which largely focuses on global models. Thus, we search for rules whose conditions refer to patterns in time series. However, we do not want to define beforehand which patterns are to be used; rather, we want the patterns to be formed from the data in t...
Landmarks: a new model for similarity-based pattern querying in time series databases
- In ICDE
, 2000
"... In this paper we present the Landmark Model, a model for time series that yields new techniques for similarity-based time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a g ..."
Abstract
-
Cited by 69 (5 self)
- Add to MetaCart
In this paper we present the Landmark Model, a model for time series that yields new techniques for similarity-based time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a general model of similarity that is consistent with human intuition and episodic memory. By tracking different specific subsets of features of landmarks, we can efficiently compute different Landmark Similarity measures that are invariant under corresponding subsets of six transformations; namely, Shifting, Uniform
Mining Asynchronous Periodic Patterns in Time Series Data
- Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD
, 2000
"... Periodicy detection in time series data is a challenging problem of great importance in many applications. ..."
Abstract
-
Cited by 50 (8 self)
- Add to MetaCart
Periodicy detection in time series data is a challenging problem of great importance in many applications.
Similarity search over time series data using wavelets
- In ICDE
, 2002
"... We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this applicatio ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this application is the Haar wavelet. In this work, we observe that a large class of wavelet transformations (not only orthonormal wavelets but also bi-orthonormal wavelets)can be used to support similarity search. This class includes the most popular and most effective wavelets being used in image compression. We present a detailed performance study of the effects of using different wavelets on the performance of similarity search for time-series data. We include several wavelets that outperform both the Haar wavelet and the best known non-wavelet transformations for this application. To ensure our results are usable by an application engineer, we also show how to configure an indexing strategy for the best performing transformations. Finally, we identify classes of data that can be indexed efficiently using these wavelet transformations. 1.
Efficient Retrieval of Similar Time Sequences Using DFT
- In Proc. FODO Conference, Kobe
, 1998
"... We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data. Keywords similarity retrieval, time series indexing 1 Introduction Time sequences constitute a large amount of data stored in computers. Examples include stock prices, exchange rates, weather data and biomedical measurements. We are often interested in similarity queries on time-series data [APWZ95, ALSS95]. For example, we may want to find stocks that behave in approximately the same way; or years when the temperature pat...
Variable Length Queries for Time Series Data
- IN ICDE
, 2000
"... Finding similar patterns in a time sequence is a well-known problem that has been addressed by many authors. Most of the current techniques work well for queries of a prespecified length, but fail for variable length queries. We propose a new indexing technique that works well for variable length ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Finding similar patterns in a time sequence is a well-known problem that has been addressed by many authors. Most of the current techniques work well for queries of a prespecified length, but fail for variable length queries. We propose a new indexing technique that works well for variable length queries. Our idea is to store index structures at different resolutions for a given dataset. The resolutions are based on wavelets. A number of subqueries at different resolutions are generated for each variable length query. The ranges of the subqueries are progressively refined based on results from previous subqueries. Our experiments show that the total cost for our method is 4 to 20 times less than the current techniques including Linear Scan. Because of the need to store information at multiple resolution levels, the storage requirement of our method could potentially be large. In the second part of the paper, we show how the index information can be compressed with minimal information loss. According to our experimental results, even after compressing the size of the index to one fifth, the total cost of our method is 3 to 15 times less than the current techniques.

