Results 1 -
4 of
4
Subsequence Automata with Default Transitions
"... Abstract. Let S be a string of length n with characters from an alphabet of size σ. The subsequence automaton (often called the directed acyclic subsequence graph) is the minimal deterministic finite automaton accepting all subsequences of S. A straightforward construction shows that the size (numb ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Let S be a string of length n with characters from an alphabet of size σ. The subsequence automaton (often called the directed acyclic subsequence graph) is the minimal deterministic finite automaton accepting all subsequences of S. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is O(nσ) and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with default transitions, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the delay, i.e., the maximum number of default transition followed before consuming a character. Specifically, given any integer parameter k, 1 < k ≤ σ, we present a subsequence automaton with failure transition of size O(nk log k σ) and delay O(log k σ). Hence, with k = 2 we obtain an automaton of size O(n log σ) and delay O(log σ). On the other extreme, with k = σ, we obtain an automaton of size O(nσ) and delay O(1), thus matching the bound for the standard subsequence automaton construction. The key component of our result is a novel hierarchical automata construction of independent interest.
Words distinguished by their subwords* Extended Abstract
, 2003
"... Dedicated to the memory of my brother, Janos Albert Simon Abstract We present an O(|A|(|x1 | + |x2|)) algorithm to find a shortest subword which divides x1 iff it does not divide x2, where x1 and x2 are words over the alphabet A. Based on the length of such a word an ultra-metric distance on A * can ..."
Abstract
- Add to MetaCart
Dedicated to the memory of my brother, Janos Albert Simon Abstract We present an O(|A|(|x1 | + |x2|)) algorithm to find a shortest subword which divides x1 iff it does not divide x2, where x1 and x2 are words over the alphabet A. Based on the length of such a word an ultra-metric distance on A * can be defined. The algorithm can be transformed to work in time and space O(|A | + |x1 | + |x2|).
Finding Frequent Subsequences in a Set of Texts. [version 1.8.2.9]
, 2008
"... Abstract. Given a set of strings, the Common Subsequence Automaton accepts all common subsequences of these strings. Such an automaton can be deduced from other automata like the Directed Acyclic Subsequence Graph or the Subsequence Automaton. In this paper, we introduce some new issues in text algo ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Given a set of strings, the Common Subsequence Automaton accepts all common subsequences of these strings. Such an automaton can be deduced from other automata like the Directed Acyclic Subsequence Graph or the Subsequence Automaton. In this paper, we introduce some new issues in text algorithm on the basis of Common Subsequences related problems. Firstly, we make an overview of different existing automata, focusing on their similarities and differences. Secondly, we present a new automaton, the Constrained Subsequence Automaton, which extends the Common Subsequence Automaton, by adding an integer q denoted quorum. 1
To cite this version: Alban Mancheron, Jean-Émile Symphor. Finding Frequent Subsequences in a Set of Texts.
, 2008
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.