On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
 SIGKDD'02
, 2002
Cited by 237
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Efficient retrieval of similar time sequences under time warping
 In Proc. ICDE
, 1998
StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time
 In VLDB
, 2002
"... Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market ..."
Cited by 178
Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market trader might use such a tool to spot arbitrage opportunities.
SimilarityBased Queries for Time Series Data
 Proc. 1997 ACMSIGMOD Conf
, 1997
"... We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on timeseries data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a q ..."
Cited by 140
We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on timeseries data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a query processing algorithm that uses the underlying Rtree index of a multidimensional data set to answer similarity queries efficiently. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We relate our transformations to the general framework for similarity queries of Jagadish et al. 1
Robust and fast similarity search for moving object trajectories
 In Proc. ACM SIGMOD Int. Conf. on Management of Data
, 2005
"... An important consideration in similaritybased retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, dis ..."
Cited by 93
An important consideration in similaritybased retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance functions, such as Euclidean distance, Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), and Longest Common Subsequences (LCSS), indicate that EDR is more robust than Euclidean distance, DTW and ERP, and it is on average 50% more accurate than LCSS. We also develop three pruning techniques to improve the retrieval efficiency of EDR and show that these techniques can be combined effectively in a search, increasing the pruning power significantly. The experimental results confirm the superior efficiency of the combined methods. 1.
A Survey of Temporal Knowledge Discovery Paradigms and Methods
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... AbstractÐWith the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and impl ..."
Cited by 87
AbstractÐWith the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit timevarying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining. Index TermsÐTemporal data mining, time sequence mining, trend analysis, temporal rules, semantics of mined rules. 1
Landmarks: a new model for similaritybased pattern querying in time series databases
 In ICDE
, 2000
"... In this paper we present the Landmark Model, a model for time series that yields new techniques for similaritybased time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a g ..."
Cited by 78
In this paper we present the Landmark Model, a model for time series that yields new techniques for similaritybased time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a general model of similarity that is consistent with human intuition and episodic memory. By tracking different specific subsets of features of landmarks, we can efficiently compute different Landmark Similarity measures that are invariant under corresponding subsets of six transformations; namely, Shifting, Uniform
Online Data Mining for CoEvolving Time Sequences
 In Proceedings of the 16th International Conference on Data Engineering
, 2000
"... In many applications, the data of interest comprises multiple sequences that evolve over time. Examples include currency exchange rates, network traffic data. We develop a fast method to analyze such coevolving time sequences jointly to allow (a) estimation/forecasting of missing /delayed/future v ..."
Cited by 66
In many applications, the data of interest comprises multiple sequences that evolve over time. Examples include currency exchange rates, network traffic data. We develop a fast method to analyze such coevolving time sequences jointly to allow (a) estimation/forecasting of missing /delayed/future values, (b) quantitative data mining,and (c) outlier detection. Our method, MUSCLES, adapts to changing correlations among time sequences. It can handle indefinitely long sequences efficiently using an incremental algorithm and requires only small amount of storage and less I/O operations. To make it scale for a large number of sequences, we present a variation, the Selective MUSCLES method and propose an efficient algorithm to reduce the problem size. Experiments on real datasets show that MUSCLES outperforms popular competitors in prediction accuracy up to 10 times, and discovers interesting correlations. Moreover, Selective MUSCLES scales up very well for large numbers of sequences, reducing response time up to 110 times over MUSCLES, and sometimes even improves the prediction quality.
On the Marriage of L_pnorms and Edit Distance
 IN VLDB
, 2004
"... Existing studies on time series are based on two categories of distance functions. The first category consists of the Lpnorms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shift ..."
Cited by 61
Existing studies on time series are based on two categories of distance functions. The first category consists of the Lpnorms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shifting but are nonmetric. The first