Results 1  10
of
189
PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth
, 2001
"... Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of ..."
Abstract

Cited by 334 (29 self)
 Add to MetaCart
(Show Context)
Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of which may substantially reduce the number of combinations to be examined. However, still encounters problems when a sequence database is large and/or when sequential patterns to be mined are numerous and/or long.
CloSpan: Mining Closed Sequential Patterns in Large Datasets
 In SDM
, 2003
"... Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a rain_sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent ..."
Abstract

Cited by 215 (18 self)
 Add to MetaCart
(Show Context)
Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a rain_sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent subsequences for long patterns, which is prohibitively expensive in both time and space.
BIDE: Efficient Mining of Frequent Closed Sequences
"... Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed ..."
Abstract

Cited by 150 (12 self)
 Add to MetaCart
Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenanceandtest paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long.
Sequence Mining in Categorical Domains: Incorporating Constraints
 Proceedings of the 9th International Conference on Information and Knowledge Management, Washington D.C
, 2000
"... We present cSPADE, an efficient algorithm for mining frequent sequences considering a variety of syntactic constraints. These take the form of length or width limitations on the sequences, minimum or maximum gap constraints on consecutive sequence elements, applying a time window on allowable sequen ..."
Abstract

Cited by 75 (5 self)
 Add to MetaCart
(Show Context)
We present cSPADE, an efficient algorithm for mining frequent sequences considering a variety of syntactic constraints. These take the form of length or width limitations on the sequences, minimum or maximum gap constraints on consecutive sequence elements, applying a time window on allowable sequences, incorporating item constraints, and finding sequences predictive of one or more classes, even rare ones. Our method is efficient and scalable. Experiments on a number of synthetic and real databases show the utility and performance of considering such constraints on the set of mined sequences. 1.
Mining Sequential Patterns with Constraints in Large Databases
, 2002
"... Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraintbased sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequentpattern mining does not fit our ..."
Abstract

Cited by 72 (3 self)
 Add to MetaCart
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraintbased sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequentpattern mining does not fit our missions well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into sequential pattern mining under this new framework. Moreover, this framework can be extended to constraintbased structured pattern mining as well.
Mining Long Sequential Patterns in a Noisy Environment
, 2002
"... many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. In a noisy environment, an observed sequence may not accurately reflect the underlying behavior. For example, in a protein sequence, the amino acid N is likely to mutate to D with l ..."
Abstract

Cited by 68 (12 self)
 Add to MetaCart
(Show Context)
many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. In a noisy environment, an observed sequence may not accurately reflect the underlying behavior. For example, in a protein sequence, the amino acid N is likely to mutate to D with little impact to the biological function of the protein. It would be desirable if the occurrence of D in the observation can be related to a possible mutation from N in an appropriate manner. Unfortunately, the support measure (i.e., the number of occurrences) of a pattern does not serve this purpose. In this paper, we introduce the concept of compatibility matrix as the means to provide a probabilistic connection from the observation to the underlying true value. A new metric match is also proposed to capture the "real support" of a pattern which would be expected if a noisefree environment is assumed. In addition, in the context we address, a pattern could be very long. The standard pruning technique developed for the market basket problem may not work efficiently. As a result, a novel algorithm that combines statistical sampling and a new technique (namely border collapsing) is devised to discover long patterns in a minimal number of scans of the sequence database with sufficiently high confidence. Empirical results demonstrate the robustness of the match model (with respect to the noise) and the efficiency of the probabilistic algorithm.
Constraintbased sequential pattern mining: the patterngrowth methods
, 2005
"... Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraintbased sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequentpattern mining does not fit our ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraintbased sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequentpattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can
be extended to constraintbased structured pattern mining as well.
Periodicity detection in time series databases
 IEEE TRANS. KNOWL. DATA ENG
, 2005
"... Periodicity mining is used for predicting trends in time series data. Discovering the rate at which the time series is periodic has always been an obstacle for fully automated periodicity mining. Existing periodicity mining algorithms assume that the periodicity rate (or simply the period) is user ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
(Show Context)
Periodicity mining is used for predicting trends in time series data. Discovering the rate at which the time series is periodic has always been an obstacle for fully automated periodicity mining. Existing periodicity mining algorithms assume that the periodicity rate (or simply the period) is userspecified. This assumption is a considerable limitation, especially in time series data where the period is not known a priori. In this paper, we address the problem of detecting the periodicity rate of a time series database. Two types of periodicities are defined, and a scalable, computationally efficient algorithm is proposed for each type. The algorithms perform in Oðn log nÞ time for a time series of length n. Moreover, the proposed algorithms are extended in order to discover the periodic patterns of unknown periods at the same time without affecting the time complexity. Experimental results show that the proposed algorithms are highly accurate with respect to the discovered periodicity rates and periodic patterns. Realdata experiments demonstrate the practicality of the discovered periodic patterns.
An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes
, 2001
"... . This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two expert methods decide where in the window boundaries should be drawn. ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
. This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two expert methods decide where in the window boundaries should be drawn. The algorithm segments text into words successfully, and has also been tested with a data set of mobile robot activities. We claim that the algorithm finds