Results 1 
5 of
5
Time Series Motifs Statistical Significance
"... Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from finance to health. Many algorithms have been proposed for the task of efficiently finding motifs. Surprisingly, most of th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from finance to health. Many algorithms have been proposed for the task of efficiently finding motifs. Surprisingly, most of these proposals do not focus on how to evaluate the discovered motifs. They are typically evaluated by human experts. This is unfeasible even for moderately sized datasets, since the number of discovered motifs tends to be prohibitively large. Statistical significance tests are widely used in bioinformatics and association rules mining communities to evaluate the extracted patterns. In this work we present an approach to calculate time series motifs statistical significance. Our proposal leverages work from the bioinformatics community by using a symbolic definition of time series motifs to derive each motif’s pvalue. We estimate the expected frequency of a motif by using Markov Chain models. The pvalue is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique statistical tests to a time series setting. This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif.
A Word Counting Graph
, 2009
"... Abstract. We study methods for counting occurrences of words from a given set H over an alphabet V in a given text. All words have the same length m. Our goal is the computation of the probability to find p occurrences of words from a set H in a random text of size n, assuming that the text is gener ..."
Abstract
 Add to MetaCart
Abstract. We study methods for counting occurrences of words from a given set H over an alphabet V in a given text. All words have the same length m. Our goal is the computation of the probability to find p occurrences of words from a set H in a random text of size n, assuming that the text is generated by a Bernoulli or Markov model. We have designed an algorithm solving the problem; the algorithm relies on traversals of a graph, whose set of vertices is associated with the overlaps of words from H. Edges define two oriented subgraphs that can be interpreted as equivalence relations on words of H. Let P(H) be the set of equivalence classes and S be the set of other vertices. The run time for the Bernoulli model is O(np(P(H)+S)) time and the space complexity is O(pmS+P(H)). In a Markov model of order K, additional space complexity is O(pmV  K) and additional time complexity is O(npmV  K). Our preprocessing uses a variant of AhoCorasick automaton and achieves O(mH) time complexity. Our algorithm is implemented and provides a significant space improvement in practice. We compare its complexity to the additional improvement due to AhoCorasick minimization. 1