## Constraint relaxations for discovering unknown sequential patterns (2005)

Venue: | In Proceedings of the Third International Workshop on Knowledge Discovery in Inductive Databases |

Citations: | 5 - 2 self |

### BibTeX

@INPROCEEDINGS{Antunes05constraintrelaxations,

author = {Cláudia Antunes and Arlindo L. Oliveira and Técnico Inesc-id},

title = {Constraint relaxations for discovering unknown sequential patterns},

booktitle = {In Proceedings of the Third International Workshop on Knowledge Discovery in Inductive Databases},

year = {2005},

pages = {11--32},

publisher = {Springer}

}

### OpenURL

### Abstract

Abstract. The main drawbacks of sequential pattern mining have been its lack of focus on user expectations and the high number of discovered patterns. However, the solution commonly accepted – the use of constraints – approximates the mining process to a verification of what are the frequent patterns among the specified ones, instead of the discovery of unknown and unexpected patterns. In this paper, we propose a new methodology to mine sequential patterns, keeping the focus on user expectations, without compromising the discovery of unknown patterns. Our methodology is based on the use of constraint relaxations, and it consists on using them to filter accepted patterns during the mining process. We propose a hierarchy of relaxations, applied to constraints expressed as context-free languages, classifying the existing relaxations (legal, valid and naïve, proposed in SPIRIT [3]), and proposing several new classes of relaxations, ranging from the approx and non-accepted, to the composition of different types of relaxations, like the approx-legal or the non-prefix-valid relaxations. At last, we present a case study that show the results achieved with the application of this methodology on the analysis of the curricular sequences of computer science students. 1

### Citations

3825 |
Introduction to Automata Theory, Languages, and Computation
- Hopcroft, UlIman
- 1979
(Show Context)
Citation Context ... the stack alphabet; δ is a mapping from Q×Σ∪{ε}×Γ to finite subsets of Q×Γ*; q0∈Q is the initial state; Z0∈Γ is a particular stack symbol called the start symbol, and F ⊆Q is the set of final states =-=[6]-=-.s4 Cláudia Antunes and Arlindo L. Oliveira The language accepted by a pushdown automaton is the set of all inputs for which some sequence of moves causes the pushdown automaton to empty its stack and... |

549 | Mining Sequential Patterns: Generalizations and Performance Improvements
- Agrawal, Srikant
- 1996
(Show Context)
Citation Context ...t with applications in other areas like the medical domain. The problem was first introduced by Agrawal and Srikant, and, in the last years, several sequential pattern mining algorithms were proposed =-=[11]-=-, [13], [9]. Despite the reasonable efficiency of those algorithms, the lack of focus and user control hass2 Cláudia Antunes and Arlindo L. Oliveira hampered the generalized use of sequential pattern ... |

234 | Mining association rules with item constraints
- Srikant, Vu, et al.
- 1997
(Show Context)
Citation Context ...t over the sequence content is to impose that only some items are of interest – item constraints. An example of such constraint is the use of Boolean expressions over the presence or absence of items =-=[12]-=-. When applied to sequential pattern mining, constraints over the content can be just a constraint over the items to consider, or a constraint over the sequence of items. More recently, regular langua... |

158 | SPIRIT: sequential pattern mining with regular expression constraints
- Garofalakis, Rastogi, et al.
- 1999
(Show Context)
Citation Context ...the mining process. We propose a hierarchy of relaxations, applied to constraints expressed as context-free languages, classifying the existing relaxations (legal, valid and naïve, proposed in SPIRIT =-=[3]-=-), and proposing several new classes of relaxations, ranging from the approx and non-accepted, to the composition of different types of relaxations, like the approx-legal or the non-prefix-valid relax... |

86 |
Binary codes capable of correcting spurious insertions and deletions of ones
- Levenshtein
- 1965
(Show Context)
Citation Context ...e. This cost of operations will be called the generation cost, and is similar to the edit distance between two sequences, and the operations to consider can be the Insertion, Deletion and Replacement =-=[8]-=-. (s, Given a constraint C, expressed as a context-free language, and a real number ε which represents the maximum error allowed, a sequence s is said to be approximate-accepted by C, if its generatio... |

75 | Efficient enumeration of frequent sequences
- Zaki
- 1998
(Show Context)
Citation Context ... applications in other areas like the medical domain. The problem was first introduced by Agrawal and Srikant, and, in the last years, several sequential pattern mining algorithms were proposed [11], =-=[13]-=-, [9]. Despite the reasonable efficiency of those algorithms, the lack of focus and user control hass2 Cláudia Antunes and Arlindo L. Oliveira hampered the generalized use of sequential pattern mining... |

58 | Mining Sequential Patterns with Constraints in Large Databases
- Pei, Han, et al.
- 2002
(Show Context)
Citation Context ...[(a,b),S]�pushX" represents the transition from state q1 to state q2, when the stack has the symbol S in the top and we are in the presence of (a,b)). Consider for example that algorithm PrefixGrowth =-=[10]-=- is applied and it finds a, b and c as frequent. Then it will have to proceed to discover which items are frequent after a. At this point, there is already one problem: given that it has found a, whic... |

48 | Knowledge discovery and interestingness measures: A survey
- Hilderman, Hamilton
- 1999
(Show Context)
Citation Context ... knowledge acquisition system, the same is not true in the reference frame of the final user. Indeed, several interestingness measures have been proposed for the evaluation of the discovered patterns =-=[4]-=-. Moreover, this issue is more critical with the introduction of constraints in the mining process. In fact, in the presence of constraints the concept of novel patterns becomes unclear even in the re... |

31 | ApproxMAP: Approximate Mining of Consensus Sequential Patterns
- Pei, Wang, et al.
- 2003
(Show Context)
Citation Context ...ignored in most of the approaches to pattern mining. It considers two sequences similar if they are at an edit distance below a given threshold. An exception to this generalized frame is the AproxMAP =-=[7]-=-, which uses this distance to count the support for each potential pattern. However, to our knowledge, edit distance has not been applied to constrain the pattern mining process. To address the need t... |

27 |
et al: “PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth
- Pei, Han
- 2001
(Show Context)
Citation Context ...cations in other areas like the medical domain. The problem was first introduced by Agrawal and Srikant, and, in the last years, several sequential pattern mining algorithms were proposed [11], [13], =-=[9]-=-. Despite the reasonable efficiency of those algorithms, the lack of focus and user control hass2 Cláudia Antunes and Arlindo L. Oliveira hampered the generalized use of sequential pattern mining. In ... |

14 | Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining
- Hipp, Guntzer
(Show Context)
Citation Context ...ee language. As expected, by using the constraint itself we only discover two patterns, which satisfy the context-free language. Therefore, these results are not enough to invalidate Hipp's arguments =-=[5]-=- about constraints. Nevertheless, with Legal and Valid-prefixes, it is possible to discover some other intermediate patterns, which are potentially accepted by the complete constraint. Finally, with a... |

6 | Sequential Pattern Mining with Approximated Constraints
- Antunes, Oliveira
- 2004
(Show Context)
Citation Context ...lar way. Finally, Approx-Naïve accepts sequences that have ε items (with ε the maximum error allowed) that do not belong to the language's alphabet. Recent work has proposed a new algorithm ε–accepts =-=[2]-=- to verify if a sequence was approximately generated by a given deterministic finite automata (DFA). Fortunately, the extension to deal with context-free languages is simply achieved by replacing the ... |

5 | Inference of Sequential Association Rules Guided by Context-Free Grammars
- Antunes, Oliveira
- 2002
(Show Context)
Citation Context ...ered patterns and the processing times using each relaxation. Section 5 concludes the paper with a discussion and ideas for future work. 2 Context-free Languages for Sequences of Itemsets Recent work =-=[1]-=- has shown that regular expressions can be substituted by context-free languages, without compromising the practical performance of algorithms, when dealing with strings of items. This is useful becau... |

1 |
Inference of Sequential Association Rules Guided by Support Cláudia Antunes and Arlindo L. Oliveira Context-Free Grammars
- Antunes, Oliveira
- 2002
(Show Context)
Citation Context ...ered patterns and the processing times using each relaxation. Section 5 concludes the paper with a discussion and ideas for future work. 2 Context-free Languages for Sequences of Itemsets Recent work =-=[1]-=- has shown that regular expressions can be substituted by context-free languages, without compromising the performance of algorithms, when dealing with strings of items. This is useful because context... |