Results 1  10
of
11
Probabilistic discovery of time series motifs
, 2003
"... Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of thi ..."
Abstract

Cited by 180 (24 self)
 Add to MetaCart
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Detecting Time Series Motifs Under Uniform Scaling ABSTRACT
"... Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rulediscovery, noveltydetection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algo ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rulediscovery, noveltydetection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algorithms, motif discovery has been successfully applied to many domains, including medicine, motion capture, robotics and meteorology. In this work we show that most previous applications of time series motifs have been severely limited by the definition’s brittleness to even slight changes of uniform scaling, the speed at which the patterns develop. We introduce a new algorithm that allows discovery of time series motifs with invariance to uniform scaling, and show that it produces objectively superior results in several important domains. Apart from being more general than all other motif discovery algorithms, a further contribution of our work is that it is simpler than previous approaches, in particular we have drastically reduced the number of parameters that need to be specified.
Finding Motifs in Database of Shapes
 IN PROC. OF SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM’07
, 2007
"... The problem of efficiently finding images that are similar to a target image has attracted much attention in the image processing community and is rightly considered an information retrieval task. However, the problem of finding structure and regularities in large image datasets is an area in which ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
The problem of efficiently finding images that are similar to a target image has attracted much attention in the image processing community and is rightly considered an information retrieval task. However, the problem of finding structure and regularities in large image datasets is an area in which data mining is beginning to make fundamental contributions. In this work, we consider the new problem of discovering shape motifs, which are approximately repeated shapes within (or between) image collections. As we shall show, shape motifs can have applications in tasks as diverse as anthropology, law enforcement, and historical manuscript mining. Brute force discovery of shape motifs could be untenably slow, especially as many domains may require an expensive rotation invariant distance measure. We introduce an algorithm that is two to three orders of magnitude faster than brute force search, and demonstrate the utility of our approach with several real world datasets from diverse domains.
HOT SAX: Finding the Most Unusual Time Series Subsequence: Algorithms and Applications
"... In this work, we introduce the new problem of finding time series discords. Time series discords are subsequences of a longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. Ti ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In this work, we introduce the new problem of finding time series discords. Time series discords are subsequences of a longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. Time series discords have many uses for data mining, including improving the quality of clustering, data cleaning, summarization, and anomaly detection. As we will show, discords are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is 3 to 4 orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments. In particular, we demonstrate the utility of discords with objective experiments on domains as diverse as Space Shuttle telemetry monitoring, medicine, surveillance, and industry, and we demonstrate the effectiveness of our discord discovery algorithm with more than one million experiments, on 82 different datasets from diverse domains.
Visualizing variablelength time series motifs
 In SDM
, 2012
"... The problem of time series motif discovery has received a lot of attention from researchers in the past decade. Most existing work on finding time series motifs require that the length of the motifs be known in advance. However, such information is not always available. In addition, motifs of differ ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
The problem of time series motif discovery has received a lot of attention from researchers in the past decade. Most existing work on finding time series motifs require that the length of the motifs be known in advance. However, such information is not always available. In addition, motifs of different lengths may coexist in a time series dataset. In this work, we develop a motif visualization system based on grammar induction. We demonstrate that grammar induction in time series can effectively identify repeated patterns without prior knowledge of their lengths. The motifs discovered by the visualization system are variable lengths in two ways. Not only can the intermotif subsequences have variable lengths, the intramotif subsequences also are not restricted to have identical length—a unique property that is desirable, but has not been seen in the literature.
Approximate VariableLength Time Series Motif Discovery Using Grammar Inference
 In Proceedings of the Tenth International Workshop on Multimedia Data Mining
, 2010
"... The problem of identifying frequently occurring patterns, or motifs, in time series data has received a lot of attention in the past few years. Most existing work on finding time series motifs require that the length of the patterns be known in advance. However, such information is not always availa ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The problem of identifying frequently occurring patterns, or motifs, in time series data has received a lot of attention in the past few years. Most existing work on finding time series motifs require that the length of the patterns be known in advance. However, such information is not always available. In addition, motifs of different lengths may coexist in a time series dataset. In this work, we propose a novel approach, based on grammar induction, for approximate variablelength time series motif discovery. Our algorithm offers the advantage of discovering hierarchical structure, regularity and grammar from the data. The preliminary results are promising. They show that the grammarbased approach is able to find some important motifs, and suggest that the new direction of using grammarbased algorithms for time series pattern discovery might be worth exploring. human life. Some examples of such data include speech, electrocardiogram (ECG) signals, radar signals, seismic activities, etc. In addition to the conventional definition of time series, i.e., measurements taken over time, recently, it has been shown that certain other multimedia data, e.g., images and shapes [48, 49], and XML [19], can be converted to time series and mined with promising results. Figure 1 shows an example of how shapes can be converted to time series.
Finding Approximate Frequent Patterns in Streaming Medical Data
"... Time series data is ubiquitous and plays an important role in virtually every domain. For example, in medicine, the advancement of computer technology has enabled more sophisticated patients monitoring, either onsite or remotely. Such monitoring produces massive amount of time series data, which co ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Time series data is ubiquitous and plays an important role in virtually every domain. For example, in medicine, the advancement of computer technology has enabled more sophisticated patients monitoring, either onsite or remotely. Such monitoring produces massive amount of time series data, which contain valuable information for pattern learning and knowledge discovery. In this paper, we explore the problem of identifying frequently occurring patterns, or motifs, in streaming medical data. The problem of frequent patterns mining has many potential applications, including compression, summarization, and event prediction. We propose a novel approach based on grammar induction that allows the discovery of approximate, variablelength motifs in streaming data. The preliminary results show that the grammarbased approach is able to find some important motifs in some medical data, and suggest that using grammarbased algorithms for time series pattern discovery might be worth exploring. attack prediction [38]. In bioinformatics, it is well understood that overrepresented DNA sequences often have biological significance [9, 11, 12, 28, 32]. A substantial body of literature has been devoted to techniques to discover such patterns [2, 3]. In a previous work, we defined the related concept of “time series motif ” [18], which are frequently occurring patterns in time series data. Since then, a great deal of work has been proposed for the discovery of time series motifs
Finding the unusual medical time series: Algorithms and applications
 IEEE Trans. on Information Technology
"... Abstract — In this work we introduce the new problem of finding time series discords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time se ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract — In this work we introduce the new problem of finding time series discords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is 3 to 4 orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments on electrocardiograms and other medical datasets.
Brigham and Women's Hospital
"... Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of intere ..."
Abstract
 Add to MetaCart
(Show Context)
Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of interest. Since the formalism of time series motifs in 2002, dozens of researchers have used them for diverse applications in many different domains. Because the obvious algorithm for computing motifs is quadratic in the number of items, more than a dozen approximate algorithms to discover motifs have been proposed in the literature. In this work, for the first time, we show a tractable exact algorithm to find time series motifs. As we shall show through extensive experiments, our algorithm is up to three orders of magnitude faster than bruteforce search in large datasets. We further show that our algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, nearduplicate detection and summarization, and we consider detailed case studies in domains as diverse as electroencephalograph interpretation and entomological telemetry data mining.