Results 1 - 10
of
35
Mining Partially Periodic Event Patterns With Unknown Periods
- Proc. ICDE
, 2000
"... Periodic behavior is common in real-world applications. However, in many cases, periodicities are partial in that they are present only intermittently. Herein, we study such intermittent patterns, which we refer to as p-patterns. Our formulation of p-patterns takes into account imprecise time inf ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
Periodic behavior is common in real-world applications. However, in many cases, periodicities are partial in that they are present only intermittently. Herein, we study such intermittent patterns, which we refer to as p-patterns. Our formulation of p-patterns takes into account imprecise time information (e.g., due to unsynchronized clocks in distributed environments), noisy data (e.g., due to extraneous events), and shifts in phase and/or periods. We structure mining for p-patterns as two sub-tasks: (1) finding the periods of p-patterns and (2) mining temporal associations. For (2), a level-wise algorithm is used. For (1), we develop a novel approach based on a chi-squared test, and study its performance in the presence of noise.
Mining Long Sequential Patterns in a Noisy Environment
, 2002
"... many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. In a noisy environment, an observed sequence may not accurately reflect the underlying behavior. For example, in a protein sequence, the amino acid N is likely to mutate to D with l ..."
Abstract
-
Cited by 41 (9 self)
- Add to MetaCart
many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. In a noisy environment, an observed sequence may not accurately reflect the underlying behavior. For example, in a protein sequence, the amino acid N is likely to mutate to D with little impact to the biological function of the protein. It would be desirable if the occurrence of D in the observation can be related to a possible mutation from N in an appropriate manner. Unfortunately, the support measure (i.e., the number of occurrences) of a pattern does not serve this purpose. In this paper, we introduce the concept of compatibility matrix as the means to provide a probabilistic connection from the observation to the underlying true value. A new metric match is also proposed to capture the "real support" of a pattern which would be expected if a noise-free environment is assumed. In addition, in the context we address, a pattern could be very long. The standard pruning technique developed for the market basket problem may not work efficiently. As a result, a novel algorithm that combines statistical sampling and a new technique (namely border collapsing) is devised to discover long patterns in a minimal number of scans of the sequence database with sufficiently high confidence. Empirical results demonstrate the robustness of the match model (with respect to the noise) and the efficiency of the probabilistic algorithm.
Mining periodic patterns with gap requirement from sequences
- In SIGMOD
, 2005
"... We study a problem of mining frequently occurring periodic patterns with a gap requirement from sequences. Given a character sequence S of length L and a pattern P of length l, we consider P a frequently occurring pattern in S if the probability of observing P given a randomly picked length-l subseq ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We study a problem of mining frequently occurring periodic patterns with a gap requirement from sequences. Given a character sequence S of length L and a pattern P of length l, we consider P a frequently occurring pattern in S if the probability of observing P given a randomly picked length-l subsequence of S exceeds a certain threshold. In many applications, particularly those related to bioinformatics, interesting patterns are periodic with a gap requirement. That is to say, the characters in P should match subsequences of S in such a way that the matching characters in S are separated by gaps of more or less the same size. We show the complexity of the mining problem and discuss why traditional mining algorithms are computationally infeasible. We propose practical algorithms for solving the problem, and study their characteristics. We also present a case study in which we apply our algorithms on some DNA sequences. We discuss some interesting patterns obtained from the case study. 1
TAR: Temporal Association Rules on Evolving Numerical Attributes
- Proc. ICDE
, 2001
"... Data mining has been an area of increasing interests during recent years. The association rule discovery problem in particular has been widely studied. However, there are still some unresolved problems. For example, research on mining patterns in the evolution of numerical attributes is still lackin ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Data mining has been an area of increasing interests during recent years. The association rule discovery problem in particular has been widely studied. However, there are still some unresolved problems. For example, research on mining patterns in the evolution of numerical attributes is still lacking. This is both a challenging problem and one with significant practical application in business, science, and medicine. In this paper, we present a temporal association rule model for evolving numerical attributes. Metrics for qualifying a temporal association rule include the familiar measures of support and strength used in the traditional association rule mining and a new metric called density. The density metric not only gives us a way to extract the rules that best represent the data, but also provides an effective mechanism to prune the search space. An efficient algorithm is devised for mining temporal association rules, which utilizes all three thresholds (especially the strength) to prune the search space drastically. Moreover, the resulting rules are represented in a concise manner via rule sets to reduce the output size. Experimental results on real and synthetic data sets demonstrate the efficiency of our algorithm.
InfoMiner+: Mining Partial Periodic Patterns with Gap Penalties
- In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM’02
, 2002
"... In this paper, we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for the deviation from a pa ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In this paper, we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for the deviation from a pattern. However, it does not take any penalty if there exists some gap between the pattern occurrences. In many applications, e.g., bio-informatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure as generalized information gain. Furthermore, we want to find subsequence S such that for a pattern P , the generalized information gain of P in S is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and the associated subsequences.
Using Convolution to Mine Obscure Periodic Patterns In One Pass
- PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON EXTENDING DATABASE TECHNOLOGY (EDBT’04
, 2004
"... The mining of periodic patterns in time series databases is an interesting data mining problem that can be envisioned as a tool for forecasting and predicting the future behavior of time series data. Existing periodic patterns mining algorithms either assume that the periodic rate (or simply the ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The mining of periodic patterns in time series databases is an interesting data mining problem that can be envisioned as a tool for forecasting and predicting the future behavior of time series data. Existing periodic patterns mining algorithms either assume that the periodic rate (or simply the period) is user-specified, or try to detect potential values for the period in a separate phase. The former assumption is a considerable disadvantage, especially in time series databases where the period is not known a priori. The latter approach results in a multi-pass algorithm, which on the other hand is to be avoided in online environments (e.g., data streams). In this paper, we develop an algorithm that mines periodic patterns in time series databases with unknown or obscure periods such that discovering the period is part of the mining process. Based on
Meta-Patterns: Revealing Hidden Periodic Patterns
- IBM Research Report
, 2001
"... Discovery of periodic patterns in time series data has become an active research area with many applications. These patterns can be hierarchical in nature, where a higher level pattern may consist of repetitions of lower level patterns. Unfortunately, the presence of noise may prevent these higher l ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Discovery of periodic patterns in time series data has become an active research area with many applications. These patterns can be hierarchical in nature, where a higher level pattern may consist of repetitions of lower level patterns. Unfortunately, the presence of noise may prevent these higher level patterns from being recognized in the sense that two portions (of a data sequence) that support the same (high level) pattern may have different layouts of occurrences of basic symbols. There may not exist any common representation in terms of raw symbol combinations; and hence such (high level) pattern may not be expressed by any previous model (defined on raw symbols or symbol combinations) and would not be properly recognized by any existing method. In this paper, we propose a novel model, namely meta-pattern, to capture these high level patterns. As a more flexible model, the number of potential meta-patterns could be very large. A substantial difficulty lies on how to identify the proper pattern candidates. However, the well-known Apriori property is not able to provide sufficient pruning power. A new property, namely component location property, is identified and used to conduct the candidate generation so that an efficient computation-based mining algorithm can be developed. Last but not least, we apply our algorithm to some real and synthetic sequences and some interesting patterns are discovered. 1
Mining and reasoning on workflows
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... Workflow management systems represent today a key technological infrastructure for advanced applications which is attracting a growing body of research, mainly focused in developing tools for workflow management, that allow the users both to specify the “static ” aspects, like preconditions, precede ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Workflow management systems represent today a key technological infrastructure for advanced applications which is attracting a growing body of research, mainly focused in developing tools for workflow management, that allow the users both to specify the “static ” aspects, like preconditions, precedences among activities, rules for exception handling, and to control its execution, by scheduling the activities on the available resources. This paper deals with an aspect of workflows which has so far not received much attention even though it is crucial for the forthcoming scenarios of large scale applications on the web: providing facilities for the human system administrator for identifying the choices performed more frequently in the past that had lead to a desired final configuration. In this context, we formalize the problem of discovering the most frequent patterns of executions, i.e., the workflow substructures that have been scheduled more frequently by the system. We attacked the problem by developing two data mining algorithms, on the basis of an intuitive and original graph formalization of a workflow schema and its occurrences. The model is used both to prove some intractability results, that strongly motivate the use of data
Business process impact visualization and anomaly detection
- Information Visualization
, 2006
"... Business operations involve many factors and relationships and are modeled as complex business process workflows. The execution of these business processes generates vast volumes of complex data. The operational data are instances of the process flow, taking different paths through the process. The ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Business operations involve many factors and relationships and are modeled as complex business process workflows. The execution of these business processes generates vast volumes of complex data. The operational data are instances of the process flow, taking different paths through the process. The goal is to use the complex information to analyze and improve operations and to optimize the process flow. In this paper, we introduce a new visualization technique, called VisImpact that turns raw operational business data into valuable information. VisImpact reduces data complexity by analyzing operational data and abstracting the most critical factors, called impact factors, which influence business operations. The analysis may identify single nodes of the business flow graph as important factors but it may also determine aggregations of nodes to be important. Moreover, the analysis may find that single nodes have certain data values associated with them which have an influence on some business metrics or resource usage parameters. The impact factors are presented
The citiKey website. http://www.e-street.com
- IEEE Transactions on Knowledge and Data Engineering (TKDE
"... In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. For example, people wake up at the same time and follow more or less the same route to their work everyday. The discovery ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. For example, people wake up at the same time and follow more or less the same route to their work everyday. The discovery of hidden periodic patterns in spatiotemporal data could provide unveiling important information to the data analyst. Existing approaches on discovering periodic patterns focus on symbol sequences. However, these methods cannot directly be applied to a spatiotemporal sequence because of the fuzziness of spatial locations in the sequence. In this paper, we define the problem of mining periodic patterns in spatiotemporal data and propose an effective and efficient algorithm for retrieving maximal periodic patterns. In addition, we study two interesting variants of the problem. The first is the retrieval of periodic patterns that are not frequent in the whole history, but during a continuous subinterval of it. The second problem is the discovery of periodic patterns, some instances of which may be shifted or distorted. We demonstrate how our mining technique can be adapted for these variants. Finally, we present a comprehensive experimental evaluation, where we show the effectiveness and efficiency of the proposed techniques.

