#### DMCA

## On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration (2002)

### Cached

### Download Links

Venue: | SIGKDD'02 |

Citations: | 326 - 59 self |

### Citations

533 | Fast sub-sequence matching in time-series databases”. - Faloutsos, Ranganathan, et al. - 1994 |

516 | Efficient similarity search in sequence databases”. - Agrawal, Faloutsos, et al. - 1993 |

351 | Exact indexing of dynamic time warping, - Keogh, Ratanamahatana - 2005 |

316 | Locally adaptive dimensionality reduction for indexing large time series databases.
- Keogh, Chakrabarti, et al.
- 2001
(Show Context)
Citation Context ...joys a tenfold speed up when performed on disk because anysindexing technique must perform costly random access, whereasssequential scan can take advantage of an optimized linear traversesof the disk =-=[32]-=-.sThe limited number of rival methods is particularly troubling forspapers that introduce a novel similarity measure. Although 29 ofsthe papers urveyed introduce anovel similarity measure, only 12sof ... |

287 | Efficient Time Series Matching by Wavelets”, - Chan, Fu - 1999 |

236 | Fast Similarity Search in the Presence of Noise, Scaling and Translation in Time-Series Databases.
- Agrawal, Lin, et al.
- 1995
(Show Context)
Citation Context ...e series is of little utility unlessssome strawman comparison is used. Many papers ask us tosconsider the quality of their proposed similarity measure withoutsa single comparison to another technique =-=[2, 4, 8, 24, 31, 38, 39,s41, 42, 46, 57]-=-. This in particularly surprising since the mostsobvious trawman, Euclidian distance, is trivial to implement (Forsexample, in the Matlab programming language it requires only 19scharacters: sqrt(sum(... |

215 | Efficient retrieval of similar time sequences under time warping. - Yi, Jagadish, et al. - 1998 |

183 | Fast Time Sequence Indexing for Arbitrary Lp Norms. - Yi, Faloutsos - 2000 |

182 | Rule discovery from time series. In: - Das, Lin, et al. - 1998 |

165 | An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback”.
- Keogh, Pazzani
- 1998
(Show Context)
Citation Context ...at created thestime series has changed [19, 20], or segmentation may simply besperformed to created a high level representation f the time seriessthat supports indexing, clustering and classification =-=[20, 30, 31,s37, 39, 42, 44, 46, 48, 52, 57]-=-.sAs mentioned above, our experiments were conducted on 50 realsworld, highly diverse datasets. Space limitations prevent us fromsdescribing all 50 datasets in detail, so we simply note thesfollowing.... |

158 | Similarity-Based Queries for Time Series Data”, - Rafiei, Mendelzon - 1997 |

145 |
Event detection from time series data.
- Guralnik, Srivastava
- 1999
(Show Context)
Citation Context ...ion queries [57] and relevance f edback [30].s• Support concurrent mining of text and time series [37].s• Support novel clustering and classification algorithms [30].s• Support change point detection =-=[20, 23]-=-.sSurprisingly, in spite of the ubiquity of this representation, withsthe exception of [52], there has been little attempt to understandsand compare the algorithms that produce it.sAlthough appearing ... |

129 | Finding Patterns in Time Series: A Dynamic Programming Approach. - Berndt, Clifford - 1996 |

129 | Efficiently supporting ad hoc queries in large datasets of time sequences.
- Korn, Jagadish, et al.
- 1997
(Show Context)
Citation Context ...ction stage. Thesproposed representations include the Discrete Fourier Transforms(DFT) [1, I 1, 16, 28, 49, 50], several kinds of Wavelets (DWT)s[10, 27, 45, 51, 57, 60], Singular Value Decomposition =-=[32, 35]-=-,sAdaptive Piecewise Constant Approximation [32], Inner Productss[18] and Piecewise Aggregate Approximation (PAA) [61]. Thesmajority of work has focused solely on performance issues,showever some auth... |

122 | Querying shapes of histories. - Agrawal, Psaila, et al. - 1995 |

121 | On similarity queries for timeseries data: constraint specification and implementation.
- Goldin, Kanellakis
- 1995
(Show Context)
Citation Context ...dissimilar. Surprisingly,smany of the papers included in the survey, whose mainscontribution was to introduce a new similarity measure, fail tosshow even one example of a matching pair of time series =-=[4, 8,s19, 22, 24, 26, 34, 36, 38, 42, 43, 48, 57]-=-. Moreover, showingssome examples of matching time series is of little utility unlessssome strawman comparison is used. Many papers ask us tosconsider the quality of their proposed similarity measure ... |

116 | A probabilistic approach to fast pattern matching in time series databases,”
- Keogh, Smyth
- 1997
(Show Context)
Citation Context ...at created thestime series has changed [19, 20], or segmentation may simply besperformed to created a high level representation f the time seriessthat supports indexing, clustering and classification =-=[20, 30, 31,s37, 39, 42, 44, 46, 48, 52, 57]-=-.sAs mentioned above, our experiments were conducted on 50 realsworld, highly diverse datasets. Space limitations prevent us fromsdescribing all 50 datasets in detail, so we simply note thesfollowing.... |

100 | Finding Similar Time Series. - Das, Gunopulos, et al. - 1997 |

90 | Approximate queries and representations for large data sequences. - Shatkay, Zdonik - 1996 |

83 | Similarity search over time series data using wavelets.
- Popivanov, Miller
- 2002
(Show Context)
Citation Context ...suggest asdifferent approach to the dimensionality reduction stage. Thesproposed representations include the Discrete Fourier Transforms(DFT) [1, I 1, 16, 28, 49, 50], several kinds of Wavelets (DWT)s=-=[10, 27, 45, 51, 57, 60]-=-, Singular Value Decomposition [32, 35],sAdaptive Piecewise Constant Approximation [32], Inner Productss[18] and Piecewise Aggregate Approximation (PAA) [61]. Thesmajority of work has focused solely o... |

79 | Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. In - Indyk, Koudas, et al. - 2000 |

73 |
Distance Measures for Effective Clustering of ARIMA Time-Series.
- Kalpakis, Gada, et al.
- 2001
(Show Context)
Citation Context ...gle linkage method, with 4sdifferent distance measures. Euclidean distance andsDynamic Time Warping are decade old strawmen. Thesother two approaches have recently been proposed in datasmining papers =-=[57, 29]-=-sDendrograms are particularly attractive since a clustering of Msobjects summarizes O(M) measurements, however otherspossibilities of visualizing the quality of a similarity measuresincluded projectin... |

72 | Deformable markov model templates for time-series pattern matching,’ in KDD,
- Smyth
- 2000
(Show Context)
Citation Context ...K << n) such that Q closely approximates Q. 102sNote that segmentation has two major uses. It may be performedsin order to determine when the underlying model that created thestime series has changed =-=[19, 20]-=-, or segmentation may simply besperformed to created a high level representation f the time seriessthat supports indexing, clustering and classification [20, 30, 31,s37, 39, 42, 44, 46, 48, 52, 57].sA... |

72 | Pattern extraction for time series classification. In: - Geurts - 2001 |

68 |
Machine Learning as an Experimental Science.
- Kibler, Langley
- 1988
(Show Context)
Citation Context ...s tested on 1.85 datasets (1.26 real and 0.59ssynthetic). This numbers are astonishingly ow when you considersthat new machine learning algorithms are typically evaluated onsat least a dozen datasets =-=[12, 33]-=-.sIn fact, we feel that the numbers above are optimistic. Of the 30spapers that use two or more datasets, a very significant fractions(64%), use both stock market data and random walk data.sHowever, w... |

67 | Efficient retrieval of similar time sequences using DFT. - Rafiei, Mendelzon - 1998 |

65 | T.R.: FALCON: Feedback Adaptive Loop for Content-Based Retrieval. In: - Wu, Faloutsos, et al. - 2000 |

60 | Mining of concurrent text and time series.
- Lavrenko, Schmill, et al.
- 2000
(Show Context)
Citation Context ...ata to be the same, each paper in the survey is tested on averageson only 1.28 different datasets. This number might be reasonablesif the contribution had being claimed for only a single type of datas=-=[19, 37]-=-, or it had been shown that the choice of dataset has littlesinfluence on the outcome. However, the choice of dataset has ashuge effect on the performance of time series algorithms. We willsdemonstrat... |

60 | Similarity Search for Multidimensional Data Sequences. - Lee, Chun, et al. - 2000 |

58 | A comparison of DFT and DWT based similarity search in time-series databases.
- Wu, Agrawal, et al.
- 2000
(Show Context)
Citation Context ...suggest asdifferent approach to the dimensionality reduction stage. Thesproposed representations include the Discrete Fourier Transforms(DFT) [1, I 1, 16, 28, 49, 50], several kinds of Wavelets (DWT)s=-=[10, 27, 45, 51, 57, 60]-=-, Singular Value Decomposition [32, 35],sAdaptive Piecewise Constant Approximation [32], Inner Productss[18] and Piecewise Aggregate Approximation (PAA) [61]. Thesmajority of work has focused solely o... |

58 | An index-based approach for similarity search supporting time warping in large sequence databases. - Kim, Park, et al. - 2001 |

57 | Efficient pruning methods for separate-and-conquer rule learning systems.
- Cohen
- 1993
(Show Context)
Citation Context ...s tested on 1.85 datasets (1.26 real and 0.59ssynthetic). This numbers are astonishingly ow when you considersthat new machine learning algorithms are typically evaluated onsat least a dozen datasets =-=[12, 33]-=-.sIn fact, we feel that the numbers above are optimistic. Of the 30spapers that use two or more datasets, a very significant fractions(64%), use both stock market data and random walk data.sHowever, w... |

57 | Mining the stock market: Which measure is best? (extended abstract - Gavrilov, Anguelov, et al. |

57 | Variable length queries for time series data. In
- Kahveci, Singh
- 2001
(Show Context)
Citation Context ...suggest asdifferent approach to the dimensionality reduction stage. Thesproposed representations include the Discrete Fourier Transforms(DFT) [1, I 1, 16, 28, 49, 50], several kinds of Wavelets (DWT)s=-=[10, 27, 45, 51, 57, 60]-=-, Singular Value Decomposition [32, 35],sAdaptive Piecewise Constant Approximation [32], Inner Productss[18] and Piecewise Aggregate Approximation (PAA) [61]. Thesmajority of work has focused solely o... |

56 | Fast Time-Series Searching with Scaling and Shifting. - Chu, Wong - 1999 |

53 | Abbadi. Approximate nearest neighbor searching in multimedia databases. - Ferhatosmanoglu, Tuncel, et al. - 2001 |

49 | Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases. - Park, Chu, et al. - 2000 |

48 | Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputing Review, - Bailey - 1991 |

43 | Matching and Indexing Sequences of Different Lengths. - Bozkaya, Yazdani, et al. - 1997 |

42 | Supporting content-based searches in time series via approximation.
- Wang, Wang
- 2000
(Show Context)
Citation Context ...at created thestime series has changed [19, 20], or segmentation may simply besperformed to created a high level representation f the time seriessthat supports indexing, clustering and classification =-=[20, 30, 31,s37, 39, 42, 44, 46, 48, 52, 57]-=-.sAs mentioned above, our experiments were conducted on 50 realsworld, highly diverse datasets. Space limitations prevent us fromsdescribing all 50 datasets in detail, so we simply note thesfollowing.... |

38 | Supporting Fast Search in Time Series for Movement Patterns in Multiple Scales.
- Qu, Wang, et al.
- 1998
(Show Context)
Citation Context |

36 | Signature technique for similarity-based queries. - Faloutsos, Jagadish, et al. - 1997 |

32 | Adaptive query processing for time-series data - Huang, Yu - 1999 |

28 | The haar wavelet transform in the time series similarity paradigm - Struzik, Siebes - 1999 |

27 |
MALM: a framework for mining sequence database at multiple abstraction levels
- Li, Yu, et al.
- 1998
(Show Context)
Citation Context |

25 | Interactive interpretation of Kohonen maps applied to curves - Debregeas, Hebrail - 1998 |

25 | Fast retrieval of similar subsequences in long sequence databases.
- Park, Lee, et al.
- 1999
(Show Context)
Citation Context ...dissimilar. Surprisingly,smany of the papers included in the survey, whose mainscontribution was to introduce a new similarity measure, fail tosshow even one example of a matching pair of time series =-=[4, 8,s19, 22, 24, 26, 34, 36, 38, 42, 43, 48, 57]-=-. Moreover, showingssome examples of matching time series is of little utility unlessssome strawman comparison is used. Many papers ask us tosconsider the quality of their proposed similarity measure ... |

23 | Mining for similarities in aligned time series using wavelets
- Huhtala, Karkkainen, et al.
(Show Context)
Citation Context ...cfons [57] 0.380 0.116sCepstrum [29] 0.570 0.458sString (Suffix Tree) [24] 0.206 0.578sImportant Points [46] 0.387 0.478sEdit Distance [8] 0.603 0.622sString Signature [4] 0.444 0.695sCosine Wavelets =-=[25]-=- 0.130 0.371sHrlder [54] 0.331 0.593sPiecewise Probabilistic [31] 0.202 0.321sThe results are quite surprising. None of the proposed techniquesscan beat the simple strawman. Their error rates are an o... |

23 | Segment-based approach for subsequence searches in sequence databases. In:
- PARK, KIM, et al.
- 2001
(Show Context)
Citation Context |

19 |
Hotbits – genuine random numbers, generated by radioactive decay. Available at http://www.fourmilab.ch/hotbits.
- Walker
- 2006
(Show Context)
Citation Context ...aset. For fairness wesused the same 100,000 subsequences for each approach. Tosensure randomness in our sampling technique wsused truesrandom numbers that were created by a quantum mechanicalsprocess =-=[55]-=-.s3.3.1 Demonstration fdata biassThe three papers li ted above experimented on a maximum of 3sdatasets. If we use that number of datasets we can demonstratesessentially any finding we wish. For exampl... |

11 | Discovering similar patterns in time series. In - Caraca-Valente, Lopez-Chavarrıas - 2000 |

11 | Multilevel filtering for high dimensional nearest neighbor search - Wang, Wang - 2000 |

9 |
On similarity-based queries for timeseries data
- Refiei
- 1999
(Show Context)
Citation Context ...gate Approximation (PAA) [61]. Thesmajority of work has focused solely on performance issues,showever some authors have also considered other issues such asssupporting non Euclidean distance measures =-=[32, 50, 61]-=- andsallowing queries of arbitrary length [32, 40, 61].s3.1 Imp lementat ionsB iassSince most time series indexing techniques use the same indexingsframework, and achieve the claimed speedup solely wi... |

7 | Using Signature Files for Querying Time-Series Data - unknown authors - 1997 |

7 |
A fast projection algorithm for sequence data searching
- Lam, Wong
- 1998
(Show Context)
Citation Context ...dissimilar. Surprisingly,smany of the papers included in the survey, whose mainscontribution was to introduce a new similarity measure, fail tosshow even one example of a matching pair of time series =-=[4, 8,s19, 22, 24, 26, 34, 36, 38, 42, 43, 48, 57]-=-. Moreover, showingssome examples of matching time series is of little utility unlessssome strawman comparison is used. Many papers ask us tosconsider the quality of their proposed similarity measure ... |

7 |
Index interpolation: an approach for subsequence matching supporting normalization transform in time-series databases. In: A. Agah et al
- Loh, Kim, et al.
- 2000
(Show Context)
Citation Context ...s focused solely on performance issues,showever some authors have also considered other issues such asssupporting non Euclidean distance measures [32, 50, 61] andsallowing queries of arbitrary length =-=[32, 40, 61]-=-.s3.1 Imp lementat ionsB iassSince most time series indexing techniques use the same indexingsframework, and achieve the claimed speedup solely with theschoice of representation, it is important to co... |

7 | TSA-tree: a wavelet based approach to improve the efficiency of multilevel surprise and trend queries on time-series data - Shahabi, Tian, et al. - 2000 |

5 |
AIM: approximate intelligent matching for time series data
- Kim, Lmn, et al.
- 2000
(Show Context)
Citation Context |

5 | A Similarity Search Method of Time Series Data with Combination of Fourier and Wavelet Transforms. In TIME. - Kawagoe, Ueda - 2002 |

4 | UCI Repository of Kdd databases - Bay - 1999 |

4 |
What Some Puzzling Problems Teach About the Theory of Simulation and the Use of Resampling. The American Statistician
- Simon
- 1994
(Show Context)
Citation Context ...counted assthe same dataset. It is well known that random walk data cansperfectly model stock market data is terms of all statisticalsproperties, including variance, autocorrelation, stationarity etcs=-=[17, 53]-=-.sWork by the late Julian L. Simon suggested that humans find itsimpossible to differentiate between the two [53]. To confirm thissfinding we asked 12 professors at UCRs Anderson GraduatesSchool of Ma... |

4 | UCI Repository of KDD databases (http://kdd.ics.uci.edu - Bay - 1999 |

2 | An Efficient Index Structure for Shift and Scale Invariant Search of Multi-Attribute Time Sequences
- Kahveci, Singh, et al.
(Show Context)
Citation Context ...sImplementing fairly complex indexing techniques allows manysopportunities for implementation bias. For example, suppose youshope to demonstrate hat DWT is superior to DFT. With shift-snormalized ata =-=[11, 28]-=- the first DWT coefficient is zero so youscould take advantage of that fact by indexing the 2 na to N+I thscoefficients, rather than the I st to Nthcoefficients. However, you might neglect doing a sim... |

2 |
Efficient and robust feature extraction and pattern matching of time series by a lattice structure. In
- Polly, Wong
- 2001
(Show Context)
Citation Context |

2 |
Search for patterns in compressed time series. Int'l Journal of Image and Graphics
- Pratt, Fink
- 2002
(Show Context)
Citation Context |

2 |
Fast time-seres searching with scaling and shifting
- Chu, Wong
- 1999
(Show Context)
Citation Context ...sImplementing fairly complex indexing techniques allows manysopportunities for implementation bias. For example, suppose youshope to demonstrate hat DWT is superior to DFT. With shift-snormalized ata =-=[11, 28]-=- the first DWT coefficient is zero so youscould take advantage of that fact by indexing the 2 na to N+I thscoefficients, rather than the I st to Nthcoefficients. However, you might neglect doing a sim... |

1 | papers except [5 - All |

1 | A quantitative study of neural network learning algorithm evaluation practices - unknown authors - 1995 |

1 | Supporting subseries nearest neighbor search via approximation - Wang |

1 | are included in the survey. The excluded papers contain background information - except - 1999 |

1 | Querying shapes of histories. In proceedings ofthe 21 st lnt7 Conference on Very Large Databases - Agrawal, Psaila, et al. - 1995 |

1 | Finding patterns in time series: a dynamic programming approach - Irvine - 1996 |

1 |
Matching and indexing sequences ofdifferent lengths
- Bozkaya, Yazdani, et al.
- 1997
(Show Context)
Citation Context |

1 | Discovering similar patterns in time seres - Caraea-Valente, Lopez-Chavarrias - 2000 |

1 |
Efficient ime series matching by wavelets
- Chan, Fu
- 1999
(Show Context)
Citation Context |

1 | Finding similar time seiies - Das, Gunopulos, et al. - 1997 |

1 | Mining the stock market: which measure isbest - Gavrilov, Angnelov, et al. - 2000 |

1 |
Adaptivet~luery p ocessing for time-series data
- Huang, Yu
- 1999
(Show Context)
Citation Context |

1 |
Identifying representative tr nds in massive time series data sets using sketches
- Indyk, Koudas, et al.
- 2000
(Show Context)
Citation Context |

1 |
Similarity search for multidimensional d ta sequences
- Lee, Chun, et al.
- 2000
(Show Context)
Citation Context |

1 |
A quantitative studyof neural network learning algorithm evaluation practices
- unknown authors
- 1995
(Show Context)
Citation Context ... of the survey, we echo the caution ofsPrechelt, that "while high numbers resulting from such countingscannot prove that the evaluation has high quality, low numberss(suggesO that the quality is low" =-=[47]-=-.s2.1 Size of Test DatasetssWe recorded the size the test dataset for each paper. Where two orsmore datasets are used, we considered only the size of the largest.sThe results are quite surprising; the... |

1 |
Approximate queries and representations forlarge data sequences
- Shatkay, Zdonik
- 1996
(Show Context)
Citation Context |

1 |
The Haar wavelet transform in the time series imilarity paradigm
- Struzik, Siebes
- 1999
(Show Context)
Citation Context ...epstrum [29] 0.570 0.458sString (Suffix Tree) [24] 0.206 0.578sImportant Points [46] 0.387 0.478sEdit Distance [8] 0.603 0.622sString Signature [4] 0.444 0.695sCosine Wavelets [25] 0.130 0.371sHrlder =-=[54]-=- 0.331 0.593sPiecewise Probabilistic [31] 0.202 0.321sThe results are quite surprising. None of the proposed techniquesscan beat the simple strawman. Their error rates are an order ofsmagnitude worse ... |

1 | Supporting subsefies nearest neighbor search via approximation - Wang, Wang - 2000 |