## Probabilistic discovery of time series motifs (2003)

### Cached

### Download Links

- [people.apache.org]
- [www.cc.gatech.edu]
- [www.cc.gatech.edu]
- [www.cc.gatech.edu]
- [www.cs.ucr.edu]
- [www.cs.ucr.edu]
- [pdf.aminer.org]
- [www.cs.ucr.edu]
- [www.cs.ucr.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 124 - 22 self |

### BibTeX

@INPROCEEDINGS{Chiu03probabilisticdiscovery,

author = {Bill Chiu and Eamonn Keogh and Stefano Lonardi},

title = {Probabilistic discovery of time series motifs},

booktitle = {},

year = {2003},

pages = {493--498}

}

### Years of Citing Articles

### OpenURL

### Abstract

Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.

### Citations

867 | Biological sequence analysis. Probabilistic models of proteins and nucleic acids (Cambridge - Durbin, Eddy, et al. - 1998 |

514 |
Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment
- Lawrence, Altschul, et al.
- 1993
(Show Context)
Citation Context ...uitive results to be obtained. We note that the utility of allowing don’t care sections in time series has been documented before [1, 22], and it is a cornerstone of text and Biosequences data mining =-=[3, 24, 25, 28, 30, 34]-=-. The previous example illustrates the dangers of mining in the presence of noise. Indeed, this single spike might be best taken care of with a simple smoothing algorithm. More generally, however, we ... |

448 | R.: Similarity search in high dimensions via hashing - Gionis, Indyk, et al. - 1999 |

330 | Introduction to algorithms, 2nd edition - Cormen, Leiserson, et al. - 2002 |

282 | Identifying DNA and protein patterns with statistically significant alignments of multiple sequences - Hertz, Stormo - 1999 |

232 | On the need for time series data mining benchmarks: A survey and empirical demonstration
- Keogh, Kasetty
- 2002
(Show Context)
Citation Context ...ints would not make much difference. However even small amounts of noise can dominate distance measures, including the most commonly used data mining distance measures, such as the Euclidean distance =-=[6, 7, 8, 21, 36]-=-. Figure 3 shows that the spike can cause one of our candidate motifs to appear to be much more similar to an artificial sequence which just happens to have spike in the same place. 3 2 1 0 20 40 60 8... |

213 | Finding motifs using random projections
- Buhler, Tompa
- 2001
(Show Context)
Citation Context ...uitive results to be obtained. We note that the utility of allowing don’t care sections in time series has been documented before [1, 22], and it is a cornerstone of text and Biosequences data mining =-=[3, 24, 25, 28, 30, 34]-=-. The previous example illustrates the dangers of mining in the presence of noise. Indeed, this single spike might be best taken care of with a simple smoothing algorithm. More generally, however, we ... |

212 | Efficient time series matching by Wavelets
- Chan, Fu
- 1999
(Show Context)
Citation Context ... and noisy industrial dataset. Below) a zoom-in reveals just how similar the three occurrences are to each other There exists a vast body of work on efficiently locating known patterns in time series =-=[1, 6, 12, 23, 35, 36, 37]-=-. Here, however, we must be able to discover motifs without any prior knowledge about the regularities of the data under study. The obvious, nested-loop, brute force approach to motif discovery would ... |

209 | Unsupervised learning of multiple motifs biopolymers using expectation maximization
- Bailey, Elkan
- 1995
(Show Context)
Citation Context ...uitive results to be obtained. We note that the utility of allowing don’t care sections in time series has been documented before [1, 22], and it is a cornerstone of text and Biosequences data mining =-=[3, 24, 25, 28, 30, 34]-=-. The previous example illustrates the dangers of mining in the presence of noise. Indeed, this single spike might be best taken care of with a simple smoothing algorithm. More generally, however, we ... |

204 | Fast similarity search in the presence of noise, scaling, and translation
- Agrawal, Lin, et al.
- 1995
(Show Context)
Citation Context ... and noisy industrial dataset. Below) a zoom-in reveals just how similar the three occurrences are to each other There exists a vast body of work on efficiently locating known patterns in time series =-=[1, 6, 12, 23, 35, 36, 37]-=-. Here, however, we must be able to discover motifs without any prior knowledge about the regularities of the data under study. The obvious, nested-loop, brute force approach to motif discovery would ... |

193 | Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies - Helden, Andre, et al. - 1998 |

188 | Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
- Pevzner, Sze
- 2000
(Show Context)
Citation Context |

182 | Discovering Similar Multidimensional Trajectories - Vlachos, Kollios, et al. - 2002 |

182 | Efficient retrieval of similar time sequences under time warping - Yi, Jagadish, et al. - 1998 |

170 | Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm
- Rigoutsos, Floratos
- 1998
(Show Context)
Citation Context |

165 | Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. knowledge and information systems
- Keogh, Chakrabarti, et al.
- 2001
(Show Context)
Citation Context ... and noisy industrial dataset. Below) a zoom-in reveals just how similar the three occurrences are to each other There exists a vast body of work on efficiently locating known patterns in time series =-=[1, 6, 12, 23, 35, 36, 37]-=-. Here, however, we must be able to discover motifs without any prior knowledge about the regularities of the data under study. The obvious, nested-loop, brute force approach to motif discovery would ... |

151 |
Fast time sequence indexing for arbitrary Lp norms
- Yi, Faloutsos
- 2000
(Show Context)
Citation Context |

148 | Rule Discovery from Time Series
- Das, Lin, et al.
- 1998
(Show Context)
Citation Context ...rnia - Riverside Riverside, CA 92521 {bill, eamonn, stelo }@cs.ucr.edu 493 492 • Mining association rules in time series requires the discovery of motifs. These are referred to as primitive shapes in =-=[7]-=- and frequent patterns in [18]. • Several time series classification algorithms work by constructing typical prototypes of each class [22, 15]. These prototypes may be considered motifs. • Many time s... |

140 | Reduction techniques for instancebased learning algorithms - Wilson, Martinez |

134 | Efficient mining of partial periodic patterns in time series database
- Han, Dong, et al.
- 1999
(Show Context)
Citation Context ...nsors. We see these “experiences” as motifs. • Much of the work on finding approximate periodic patterns in time series can viewed as an attempt to discover motifs that occur at constrained intervals =-=[14]-=-. For example, the astute reader may have noticed that the motif in Figure 1 appears at approximately equal intervals, suggesting an unexpected regularity. In addition to the application domains menti... |

134 | An Enhanced Representation of Time Series which Allows Fast and Accurate Classification, Clustering and Relevance Feedback - Keogh, Pazzani - 1998 |

129 | An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences - Lawrence, Reilly - 1990 |

104 | Novelty Detection in Time Series Data Using Ideas from Immunology
- Dasgupta, Forrest
- 1999
(Show Context)
Citation Context ...ess detection algorithms essentially consist of modeling normal behavior with a set of typical shapes (which we see as motifs), and detecting future patterns that are dissimilar to all typical shapes =-=[8]-=-. • In robotics, Oates et al. [27], have introduced a method to allow an autonomous agent to generalize from a set of qualitatively different experiences gleaned from sensors. We see these “experience... |

82 | Probabilistic and statistical properties of words: an overview - Reinert - 2000 |

78 | Finding motifs in time series
- Lin, Keogh, et al.
- 2002
(Show Context)
Citation Context ... finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs =-=[26]-=-. We will define motifs more formally later in this work. In the meantime a simple graphic example will serve to develop the reader’s intuition. Figure 1 illustrates an example of a motif discovered i... |

77 | Efficient large-scale sequence comparison by locality-sensitive hashing - Buhler |

64 | Deformable markov model templates for time-series pattern matching
- Ge, Smyth
- 2000
(Show Context)
Citation Context |

64 |
Identifying representative trends in massive time series data sets using sketches
- Indyk, Koudas, et al.
- 2000
(Show Context)
Citation Context ...spond to an upward trend or a downward trend of arbitrary angles. These “degenerate motifs” are unlikely to be of interest to anyone, and in any case, are trivial to enumerate with a simple algorithm =-=[19]-=-. We will therefore exclude them from further consideration. This can easily be achieved at the feature extraction stage, when using sliding windows to extract the subsequences. As the window is moved... |

49 | Discovery of Temporal Patterns – Learning Rules about the Qualitative Behaviour of Time Series
- Hoppner
(Show Context)
Citation Context ...A 92521 {bill, eamonn, stelo }@cs.ucr.edu 493 492 • Mining association rules in time series requires the discovery of motifs. These are referred to as primitive shapes in [7] and frequent patterns in =-=[18]-=-. • Several time series classification algorithms work by constructing typical prototypes of each class [22, 15]. These prototypes may be considered motifs. • Many time series anomaly/interestingness ... |

41 | LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance - Keogh, Wei, et al. - 2006 |

36 | Monotony of surprise and large-scale quest for unusual words
- Apostolico, Bock, et al.
- 2002
(Show Context)
Citation Context ... the challenge by Pevzner and Sze [28] (see below). We mention, in no particular order and without pretending to be exhaustive, TEIRESIAS [30], GIBBSSAMPLER [24], MEME [3], WINNOWER [28], VERBUMCULUS =-=[2]-=-, PROJECTION [34], among others. Of particular interest is the PROJECTION algorithm by Buhler and Tompa [34]. They applied random projection in their paper to find motif in nucleotide sequences. The m... |

35 | Methods for discovering novel motifs in nucleic acid sequences - Staden - 1989 |

33 | A method for clustering the experiences of a mobile robot that accords with human judgements
- Oates, Schmill, et al.
- 2000
(Show Context)
Citation Context ...lly consist of modeling normal behavior with a set of typical shapes (which we see as motifs), and detecting future patterns that are dissimilar to all typical shapes [8]. • In robotics, Oates et al. =-=[27]-=-, have introduced a method to allow an autonomous agent to generalize from a set of qualitatively different experiences gleaned from sensors. We see these “experiences” as motifs. • Much of the work o... |

33 | Discovery of timeseries motif from multi-dimensional data based on mdl principle - Tanaka, Iwamoto, et al. |

31 | 80 million tiny images: a large database for non-parametric object and scene recognition - Torralba, Fergus, et al. - 2008 |

28 | iSAX: indexing and mining terabyte sized time series - Shieh, Keogh - 2008 |

25 |
A Bibliography of Temporal
- Roddick, Spiliopoulou
- 1999
(Show Context)
Citation Context ...roper context we will briefly consider related work. To date the majority of work in time series data mining has focused indexing time series, the efficient discovery of known patterns in time series =-=[1, 6, 12, 21, 22, 23, 31, 35, 36, 37]-=-. The innovative work of Oates et al. considers the problem of learning “qualitatively different experiences” (which we see as motifs), but the authors are working with relatively small datasets, and ... |

23 | G.: Effective proximity retrieval by ordering permutations - Chávez, Figueroa, et al. - 2009 |

20 | Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy - Minnen, Isbell, et al. |

18 | Anytime classification using the nearest neighbor algorithm with applications to stream mining - Ueno, Xi, et al. - 2006 |

15 | F.: High performance data mining using the nearest neighbor join - Bohm, Krebs - 2002 |

15 | et al.: Efficient color histogram indexing for quadratic form distance functions - Hafner - 1995 |

11 | Symbolic analysis of experimental data, Review of Scientific Instruments - Daw, Finney, et al. - 2001 |

11 | Animated People Textures - Celly, Zordan - 2004 |

11 | Knowledge construction from time series data using a collaborative exploration approach - Guyet, Garbay, et al. |

11 | Efficiently Finding Arbitrarily Scaled Patterns in Massive Time Series Databases - Keogh - 2003 |

9 | G (2004) Discovering representative models in large time series databases - Rombo, Terracina |

8 | Unsupervised activity discovery and characterization from event-streams - Hamid, Maddi, et al. - 2005 |

8 | Mining Motifs from Human Motion - Meng, Yuan, et al. |

6 | Declarative querying for biological sequences - Tata, Patel, et al. |