Results 1  10
of
110
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
 SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract

Cited by 219 (51 self)
 Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Efficient time series matching by wavelets
 Proc. of 15th Int'l Conf. on Data Engineering
, 1999
"... Time series stored as feature vectors can be indexed by multidimensional index trees like RTrees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete ..."
Abstract

Cited by 203 (1 self)
 Add to MetaCart
Time series stored as feature vectors can be indexed by multidimensional index trees like RTrees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), KarhunenLoeve (KL) transform or Singular Value Decomposition (SVD) can be applied. While the use of DFT and KL transform or SVD have been studied in the literature, to our knowledge, there is no indepth study on the application of DWT. In this paper, we propose to use Haar Wavelet Transform for time series indexing. The major contributions are: (1) we show that Euclidean distance is preserved in the Haar transformed domain and no false dismissal will occur, (2) we show that Haar transform can outperform DFT through experiments, (3) a new similarity model is suggested to accommodate vertical shift of time series, and (4) a twophase method is proposed for efficientnearest neighbor query in time series databases. 1.
Efficient Retrieval of Similar Time Sequences Under Time Warping
, 1997
"... Fast similarity searching in large timesequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first ..."
Abstract

Cited by 174 (3 self)
 Add to MetaCart
Fast similarity searching in large timesequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first few DFT coefficients) and subsequent use of fast spatial access methods for the points in feature space. In this work we examine a popular, fieldtested dissimilarity function, the "time warping" distance function which permits local accelerations and decelerations in the rate of the signals or sequences. This function is natural and suitable for several applications, like matching of voice, audio and medical signals (e.g., electrocardiograms) However, from the indexing viewpoint it presents two major challenges: (a) it does not lead to any natural "features", precluding the use of spatial access methods (b) it is quadratic (O(len 1 len 2 )) on the length of the sequences involved. Here we ...
StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time
 In VLDB
, 2002
"... Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market ..."
Abstract

Cited by 167 (10 self)
 Add to MetaCart
Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market trader might use such a tool to spot arbitrage opportunities.
Rule discovery from time series
 In Proceedings of the 1997 ACM SIGKDD International Conference, ACM SIGKDD
, 1997
"... We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such as "a period of low telephone call activity is usually followed by a sharp rise ill call vohune". Exa ..."
Abstract

Cited by 143 (0 self)
 Add to MetaCart
We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such as "a period of low telephone call activity is usually followed by a sharp rise ill call vohune". Examples of rules relating two or more time series are "if the Microsoft stock price goes up and lntel falls, then IBM goes up the next. day, " and "if Microsoft goes up strongly fro " one day, then declines strongly on the next day, and on the same days Intel stays about, level, then IBM stays about level. " Our emphasis is in the discovery of local patterns in multivariate time series, in contrast to traditional time series analysis which largely focuses on global models. Thus, we search for rules whose conditions refer to patterns in time series. However, we do not want to define beforehand which patterns are to be used; rather, we want the patterns to be formed fl’om the data in the context of rule discovery. We describe adaptive methods for finding rules of the above type fi’om timeseries data. The methods are based on discretizing the sequence hy methods resembling vector quantization. \,Ve first form subsequences by sliding window through the time series, and then cluster these subsequences by using a suitable measure of timeseries similarity. The discretized version of the time series is obtained by taldng the cluster identifiers corresponding to the subsequence. Once tl,e timeseries is discretized, we use simple rule finding methods to obtain rifles from the sequence. "vVe present empMcal resuh.s on the behavior of the method.
Landmarks: a new model for similaritybased pattern querying in time series databases
 In ICDE
, 2000
"... In this paper we present the Landmark Model, a model for time series that yields new techniques for similaritybased time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a g ..."
Abstract

Cited by 73 (5 self)
 Add to MetaCart
In this paper we present the Landmark Model, a model for time series that yields new techniques for similaritybased time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a general model of similarity that is consistent with human intuition and episodic memory. By tracking different specific subsets of features of landmarks, we can efficiently compute different Landmark Similarity measures that are invariant under corresponding subsets of six transformations; namely, Shifting, Uniform
Similarity search over time series data using wavelets
 In ICDE
, 2002
"... We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over highdimensional timeseries data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this applicatio ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over highdimensional timeseries data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this application is the Haar wavelet. In this work, we observe that a large class of wavelet transformations (not only orthonormal wavelets but also biorthonormal wavelets)can be used to support similarity search. This class includes the most popular and most effective wavelets being used in image compression. We present a detailed performance study of the effects of using different wavelets on the performance of similarity search for timeseries data. We include several wavelets that outperform both the Haar wavelet and the best known nonwavelet transformations for this application. To ensure our results are usable by an application engineer, we also show how to configure an indexing strategy for the best performing transformations. Finally, we identify classes of data that can be indexed efficiently using these wavelet transformations. 1.
Mining Asynchronous Periodic Patterns in Time Series Data
 Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD
, 2000
"... Periodicy detection in time series data is a challenging problem of great importance in many applications. ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
Periodicy detection in time series data is a challenging problem of great importance in many applications.
Efficient Retrieval of Similar Time Sequences Using DFT
 Proc. Int’l Conf. Foundations of Data Organizations and Algorithms
, 1998
"... We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data. Keywords similarity retrieval, time series indexing 1
Indexing SpatioTemporal Trajectories with Chebyshev Polynomials
 Proc. 2004 SIGMOD, toappear
"... In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate a multidimensional index into the reduced space of polynomial coefficients. There are many possible ways to choose the polynomial, including Fourier transforms, splines, nonlinear regressions, etc. Some of these possibilities have indeed been studied before. We hypothesize that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial. Minimax approximation is particularly meaningful for indexing because in a branchandbound search (i.e., for finding nearest neighbours), the smaller the maximum deviation, the more pruning opportunities there exist. In general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute. However, it has been shown that the Chebyshev approximation is almost identical to the optimal minimax polynomial, and is easy to compute [32]. Thus, we shall explore how to use