Results 1 
8 of
8
A Practical Algorithm to Find the Best Subsequence Patterns
 IN PROC. OF THE THIRD INTERNATIONAL CONFERENCE ON DISCOVERY SCIENCE, VOLUME 1967 OF LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
, 2000
"... Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NPcomplete. We generalize the problem to an optimization problem, and give a practical algorithm to solve it exactly. Our algorithm uses pru ..."
Abstract

Cited by 15 (10 self)
 Add to MetaCart
Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NPcomplete. We generalize the problem to an optimization problem, and give a practical algorithm to solve it exactly. Our algorithm uses pruning heuristic and subsequence automata, and can find the best subsequence. We show some experiments, that convinced us the approach is quite promising.
Finding Best Patterns Practically
 In: Progress in Discovery Science. Volume 2281 of LNAI., SpringerVerlag
, 2002
"... Finding a pattern which separates two sets is a critical task in discovery. Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NPcomplete. Episode pattern is a generalized concept of subs ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
(Show Context)
Finding a pattern which separates two sets is a critical task in discovery. Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NPcomplete. Episode pattern is a generalized concept of subsequence pattern where the length of substring containing the subsequence is bounded. We generalize these problems to optimization problems, and give practical algorithms to solve them exactly. Our algorithms utilize some pruning heuristics based on the combinatorial properties of strings, and e#cient data structures which recognize subsequence and episode patterns.
Online Construction of Subsequence Automata for Multiple Texts
, 2000
"... We consider a deterministic finite automaton which accepts all subsequences of a set of texts, called subsequence automaton. We show an online algorithm for constructing subsequence automaton for a set of texts. It runs in O(#(m + k) + N) time using O(#m) space, where # is the size of alphab ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We consider a deterministic finite automaton which accepts all subsequences of a set of texts, called subsequence automaton. We show an online algorithm for constructing subsequence automaton for a set of texts. It runs in O(#(m + k) + N) time using O(#m) space, where # is the size of alphabet, m is the size of the resulting subsequence automaton, k is the number of texts, N is the total length of texts. It can be used to preprocess a given set S of texts in such a way that for any subsequent query w # # # , returns in O(w) time the number of texts in S which contains w as a subsequence. We also show an upper bound of the size of automaton compared to the minimum automaton.
Common Subsequence Automaton
, 2002
"... Given a set of strings, a common subsequence of this set is a string that is a subsequence of each string in this set. We describe an online algorithm building the nite automaton which accepts all common subsequences of the given set of strings. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Given a set of strings, a common subsequence of this set is a string that is a subsequence of each string in this set. We describe an online algorithm building the nite automaton which accepts all common subsequences of the given set of strings.
The minimum dawg for all suffixes of a string and its applications
 In Proc. 13th Annual Symposium on Combinatorial Pattern Matching (CPM’02), volume 2373 of Lecture Notes in Computer Science
, 2002
"... Abstract. For a string w over an alphabet Σ, we consider a composite data structure called the allsuffixes directed acyclic word graph (ASDAWG). ASDAWG(w) has w  + 1 initial nodes, and the dag induced by all reachable nodes from the kth initial node conforms with DAWG(w[k:]), where w[k:] denotes ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. For a string w over an alphabet Σ, we consider a composite data structure called the allsuffixes directed acyclic word graph (ASDAWG). ASDAWG(w) has w  + 1 initial nodes, and the dag induced by all reachable nodes from the kth initial node conforms with DAWG(w[k:]), where w[k:] denotes the kth suffix of w. We prove that the size of the minimum ASDAWG(w) (MASDAWG(w)) is Θ(w) for Σ  = 1, and is Θ(w  2) for Σ  ≥ 2. Moreover, we introduce an online algorithm which directly constructs MASDAWG(w) for given w, whose running time is linear with respect to its size. We also demonstrate some application problems, beginningsensitive pattern matching, regionsensitive pattern matching, and VLDCpattern matching, for which ASDAWGs are useful. 1
The Size of Subsequence Automaton
"... Given a set of strings, the subsequence automaton accepts all subsequences of these strings. We will derive a lower bound for the maximum number of states of this automaton. We will prove that the size of the subsequence automaton for a set of k strings of length n is ) for any k 1. It solv ..."
Abstract
 Add to MetaCart
Given a set of strings, the subsequence automaton accepts all subsequences of these strings. We will derive a lower bound for the maximum number of states of this automaton. We will prove that the size of the subsequence automaton for a set of k strings of length n is ) for any k 1. It solves an open problem posed by Crochemore and Troncek [2] in 1999, in which only the case k 2 was shown.
Words distinguished by their subwords* Extended Abstract
, 2003
"... Dedicated to the memory of my brother, Janos Albert Simon Abstract We present an O(A(x1  + x2)) algorithm to find a shortest subword which divides x1 iff it does not divide x2, where x1 and x2 are words over the alphabet A. Based on the length of such a word an ultrametric distance on A * can ..."
Abstract
 Add to MetaCart
Dedicated to the memory of my brother, Janos Albert Simon Abstract We present an O(A(x1  + x2)) algorithm to find a shortest subword which divides x1 iff it does not divide x2, where x1 and x2 are words over the alphabet A. Based on the length of such a word an ultrametric distance on A * can be defined. The algorithm can be transformed to work in time and space O(A  + x1  + x2).
Searching Subsequences
"... The thesis deals with the subsequence matching problem. We describe two approaches: preprocessing the pattern and preprocessing the text. Preprocessing the pattern involves building the searching automaton that searches for all nonoverlapping occurrences of the given pattern. We consider the case o ..."
Abstract
 Add to MetaCart
The thesis deals with the subsequence matching problem. We describe two approaches: preprocessing the pattern and preprocessing the text. Preprocessing the pattern involves building the searching automaton that searches for all nonoverlapping occurrences of the given pattern. We consider the case of multiple patterns and present an algorithm for building the automaton that searches for all nonoverlapping occurrences of each pattern from the set of patterns. Preprocessing the text consists of building the automaton which accepts all subsequences of the given text. Such automaton is called Directed Acyclic Subsequence Graph (DASG). We present an incremental algorithm for building the DASG and use encoding to reduce the number of its transitions. We show how to modify the DASG if the text has been changed. Further, we consider the case of multiple texts and present two algorithms for building the DASG for the set of texts. The subsequence matching problem for a finite regular language is also mentioned. We describe a modification of the DASG for episode matching. We also show a correspondence between the searching automaton and the DASG.