## Hot sax: Efficiently finding the most unusual time series subsequence (2005)

Citations: | 62 - 4 self |

### BibTeX

@INPROCEEDINGS{Keogh05hotsax:,

author = {Eamonn Keogh and Jessica Lin},

title = {Hot sax: Efficiently finding the most unusual time series subsequence},

booktitle = {},

year = {2005},

pages = {226--233}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this work, we introduce the new problem of finding time series discords. Time series discords are subsequences of a longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. Time series discords have many uses for data mining, including improving the quality of clustering, data cleaning, summarization, and anomaly detection. As we will show, discords are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. We evaluate our work with a comprehensive set of experiments. In particular, we demonstrate the utility of discords with objective experiments on domains as diverse as Space Shuttle telemetry monitoring, medicine, surveillance, and industry, and we demonstrate the effectiveness of our discord discovery algorithm with more than one million experiments, on 82 different datasets from diverse domains.

### Citations

8513 |
Introduction to algorithms
- Gormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ...o smaller sub-problems, which can be solved and admissibly recombined. Depending on the exact definitions, such techniques are variously called dynamic programming, divide and conquer, bottom-up, etc =-=[3]-=-. Unfortunately, as we show below, such ideas are unlikely to help us efficiently find discords. Imagine that we break a time series T into two sections, A and B, and that we find the discords for bot... |

217 | On the need for time series data mining benchmarks: A survey and empirical demostration
- Keogh, Kasetty
(Show Context)
Citation Context ...ering. 1. INTRODUCTION The previous decade has seen hundreds of papers on time series similarity search, which is the task of finding a time series that is most similar to a particular query sequence =-=[5]-=-. In this work, we pose the new problem of finding the sequence that is least similar to all other sequences. We call such sequences time series discords. Figure 1 gives a visual intuition of a time s... |

179 | A Symbolic Representation of Time Series, with Implications for Streaming Algorithms
- Lin, Keogh, et al.
(Show Context)
Citation Context ...ues for approximating the perfect ordering returned by the hypothetical Magic heuristics, we must briefly review the Symbolic Aggregate ApproXimation (SAX) representation of time series introduced in =-=[10]-=-. While there are at least 200 different symbolic approximation of time series in the literature, SAX is unique in that it is the only one that allows both dimensionality reduction and lower bounding ... |

146 | Fast algorithms for sorting and searching strings
- Bentley, Sedgewick
- 1997
(Show Context)
Citation Context ...ntly visiting). However, if we want to know the location of the other occurrence, we must visit the trie. Surprisingly, both data structures can be created in time and space linear in the length of T =-=[1]-=-. In fact, if we take advantage of the fact that we only need �log2(�)� bits for each SAX symbol, then both data structures are significantly smaller than the raw time series data they were derived fr... |

122 | Distance-based outliers: algorithms and applications
- Knorr, Ng, et al.
- 2000
(Show Context)
Citation Context ... [5], so one might imagine that such a representation would be useful for the task at hand. We could simply project our time series into n-dimensional space and use existing outlier detection methods =-=[7]-=-. The problem with this idea is the unintuitive fact that discords do not necessarily live in sparse areas of n-dimensional space (Conversely, repeated patterns do not necessarily live in dense parts ... |

118 | Towards parameter-free data mining
- Keogh, Lonardi, et al.
- 2004
(Show Context)
Citation Context ...vely recent introduction, SAX has become an important tool in the time series data mining toolbox. It has been used to find time series motifs [2], to mine rules in health data, for anomaly detection =-=[6]-=-, to extract features from ashepatitis database, for visualization [8][11], and a host of other data mining tasks. 4.1 A Brief Review of SAX A time series C of length n can be represented in a wdimens... |

116 | Exact Discovery of Time Series Motifs
- Mueen, Keogh, et al.
- 2009
(Show Context)
Citation Context ...r future work. 2. RELATED WORK AND BACKGROUND Our review of related work is exceptionally brief because we are considering a new problem. Most real valued time series problems such as motif discovery =-=[2]-=-, longest common subsequence matching, sequence averaging, segmentation, indexing [5], etc. have approximate or exact analogues in the discrete world, and have been addressed by the text processing or... |

59 | Compressed text databases with efficient query algorithms based on the compressed suffix array - Sadakane - 1969 |

34 | Visually Mining and Monitoring Massive Time Series
- Lin, Keogh, et al.
(Show Context)
Citation Context ...ries data mining toolbox. It has been used to find time series motifs [2], to mine rules in health data, for anomaly detection [6], to extract features from ashepatitis database, for visualization [8]=-=[11]-=-, and a host of other data mining tasks. 4.1 A Brief Review of SAX A time series C of length n can be represented in a wdimensional space by a vector C c1, � , c . The i w th � element of C is calcula... |

17 | Time-series Bitmaps: A Practical Visualization Tool for working with Large Time Series Databases
- Kumar, Lolla, et al.
- 2005
(Show Context)
Citation Context ... series data mining toolbox. It has been used to find time series motifs [2], to mine rules in health data, for anomaly detection [6], to extract features from ashepatitis database, for visualization =-=[8]-=-[11], and a host of other data mining tasks. 4.1 A Brief Review of SAX A time series C of length n can be represented in a wdimensional space by a vector C c1, � , c . The i w th � element of C is cal... |

11 |
K (2004) Motif discovery algorithm from motion data
- Tanaka, Uehara
(Show Context)
Citation Context ...gs of the Fifth IEEE International Conference on Data Mining (ICDM’05) 1550-4786/05 $20.00 © 2005 IEEE bioinformatics community and increasingly understood in the time series data mining community [2]=-=[13]-=-. We will therefore use the definition of non-self matches to define time series discords: Definition 6. Time Series Discord: Given a time series T, the subsequence D of length n beginning at position... |

3 |
Distinguishing string selection problems, Information and Computation 185: pp 41–55
- Wang, S, et al.
- 2003
(Show Context)
Citation Context ...er, time series discords do not appear to have a discrete version. Note that the superficially similar sounding Furthest (Sub)String Problem requires us to build a string, not to find one in the data =-=[9]-=-. 2.1 Notation For concreteness, we begin with a definition of our data type of interest, time series: Definition 1. Time Series: A time series T = t 1,…,t m is an ordered set of m real-valued variabl... |