Results 1  10
of
58
Automata and coinduction (an exercise in coalgebra
 LNCS
, 1998
"... The classical theory of deterministic automata is presented in terms of the notions of homomorphism and bisimulation, which are the cornerstones of the theory of (universal) coalgebra. This leads to a transparent and uniform presentation of automata theory and yields some new insights, amongst which ..."
Abstract

Cited by 66 (16 self)
 Add to MetaCart
The classical theory of deterministic automata is presented in terms of the notions of homomorphism and bisimulation, which are the cornerstones of the theory of (universal) coalgebra. This leads to a transparent and uniform presentation of automata theory and yields some new insights, amongst which coinduction proof methods for language equality and language inclusion. At the same time, the present treatment of automata theory may serve as an introduction to coalgebra.
Incremental Construction of Minimal Acyclic Finite State Automata and Transducers
, 1998
"... In this paper, we describe a new method for constructing mi, lmal, determin istic, acyclic finite state automata and transducers. Traditional methods consist of two steps. The first one is to construct a trie, the second one  to perform minimization. Our approach is to construct an automaton i ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
In this paper, we describe a new method for constructing mi, lmal, determin istic, acyclic finite state automata and transducers. Traditional methods consist of two steps. The first one is to construct a trie, the second one  to perform minimization. Our approach is to construct an automaton in a single step by adding new strings one by one and minjmizin the resulting automaton onthefly. We present a general algorithm as well as a specialization that relies upon the lexicographical sorting of the input strings.
State Complexity of Basic Operations on Finite Languages
"... The state complexity of basic operations on regular languages has been studied in [911]. Here we focus on finite languages. We show that the catenation of two finite languages accepted by an m state and an nstate DFA, respectively, with m ? n is accepted by a DFA of (m \Gamma n + 3)2 n\Gamma2 ..."
Abstract

Cited by 30 (11 self)
 Add to MetaCart
The state complexity of basic operations on regular languages has been studied in [911]. Here we focus on finite languages. We show that the catenation of two finite languages accepted by an m state and an nstate DFA, respectively, with m ? n is accepted by a DFA of (m \Gamma n + 3)2 n\Gamma2 \Gamma 1 states in the twoletter alphabet case, and this bound is shown to be reachable. We also show that the tight upperbounds for the number of states of a DFA that accepts the star of an nstate finite language is 2 n\Gamma3 + 2 n\Gamma4 in the twoletter alphabet case. The same bound for reversal is 3 \Delta 2 p\Gamma1 \Gamma 1 when n is even and 2 p \Gamma 1 when n is odd. Results for alphabets of an arbitrary size are also obtained. These upperbounds for finite languages are strictly lower than the corresponding ones for general regular languages.
Sorting Morphisms
 3rd International Summer School on Advanced Functional Programming, volume 1608 of LNCS
, 1998
"... . Sorting algorithms can be classified in many different ways. The way presented here is by expressing the algorithms as functional programs and to classify them by means of their recursion patterns. These patterns on their turn can be classified as the natural recursion patterns that destruct or co ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
(Show Context)
. Sorting algorithms can be classified in many different ways. The way presented here is by expressing the algorithms as functional programs and to classify them by means of their recursion patterns. These patterns on their turn can be classified as the natural recursion patterns that destruct or construct a given datatype, the so called cata and anamorphisms respectively. We show that the selection of the recursion pattern can be seen as the major design decision, in most cases leaving no more room for more decisions in the design of the sorting algorithm. It is also shown that the use of alternative data structures may lead to new sorting algorithms. This presentation also serves as a gentle, lightweight, introduction into the various morphisms. 1 Introduction In this paper we present several well known sorting algorithms, namely insertion sort, straight selection sort, bubble sort, quick sort, heap sort and merge sort (see e.g. [Knu73, Wir76]) in a nonstandard way. We express...
Fast and simple character classes and bounded gaps pattern matching, with application to protein searching
 Journal of Computational Biology
, 2001
"... The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CB ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK]  x(2,3)  [DE]  x(2,3)  Y, where the brackets match any of the letters inside, and x(2,3) a gap of length between 2 and 3). Currently, the only way to search for a CBG in a text is to convert it into a full regular expression (RE). However, a RE is more sophisticated than a CBG, and searching for it with a RE pattern matching algorithm complicates the search and makes it slow. This is the reason why we design in this article two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques. The first one looks exactly once at each text character. The second one does not need to consider all the text characters, and hence it is usually faster than the first one, but in bad cases may have to read the same text character more than once. We then propose a criterion based on the form of the CBG to choose a priori the fastest between both. We also show how to search permitting a few mistakes in the occurrences. We performed many practical experiments using the PROSITE database, and all of them show that our algorithms are the fastest in virtually all cases.
Incremental Construction of FiniteState Automata and Transducers, and their Use in the Natural Language Processing
, 1998
"... This dissertation states that it is possible to construct minimal deterministic finitestate automata fast and using little memory. Two new construction algorithms are presented. An implementation is discussed. Compared to a similar algorithm by Dominique Revuz, those presented here use far less mem ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
This dissertation states that it is possible to construct minimal deterministic finitestate automata fast and using little memory. Two new construction algorithms are presented. An implementation is discussed. Compared to a similar algorithm by Dominique Revuz, those presented here use far less memory. The thesis states that it is possible to construct automata that guess canonical forms and categories of unknown words much faster than it is done by other algorithms. A new algorithm is given and discussed. An overview of the use of finitestate automata in natural language processing (NLP) is given. A new type of automata is introduced. A method for spelling correction is enhanced so that it can handle Polish words.
Profile of Tries
, 2006
"... Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) wi ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fillup level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have been hardly studied in the past. The analysis of profiles is surprisingly arduous but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles vary. Near the root, the external profiles tend to zero in an exponentially rate, then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically
Fast Regular Expression Search
, 1999
"... . We present a new algorithm to search regular expressions, which is able to skip text characters. The idea is to determine the minimum length ` of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ` of the stri ..."
Abstract

Cited by 14 (10 self)
 Add to MetaCart
. We present a new algorithm to search regular expressions, which is able to skip text characters. The idea is to determine the minimum length ` of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ` of the strings accepted, and use it to skip text characters as done for exact string matching in previous work. As we show experimentally, the resulting algorithm is fast, the fastest one in many cases of interest. 1 Introduction The need to search for regular expressions arises in many textbased applications, such as text retrieval, text editing and computational biology, to name a few. A regular expression is a generalized pattern composed of (i) basic strings, (ii) union, concatenation and Kleene closure of other regular expressions. Readers unfamiliar with the concept and terminology related to regular expressions are referred to a classical book such as [1]. The traditional technique [16] to sea...
Typographical nearestneighbor search in a finitestate lexicon and its application to spelling correction
 Lecture Notes in Computer Science
, 2001
"... Abstract. A method of errortolerant lookup in a finitestate lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazer’s algorithm searches for all possible corrections of a misspelled word that are wi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract. A method of errortolerant lookup in a finitestate lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazer’s algorithm searches for all possible corrections of a misspelled word that are within a given similarity threshold, our approach is to retain only the most similar corrections (nearest neighbours), reducing dynamically the search space in the lexicon, and to reach the first correction as soon as possible. 1
Compact DFA Representation for Fast Regular Expression Search
, 2001
"... . We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m + 1)(2 m+1 + j\Sigma j) bits, where m is the number of charact ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
. We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m + 1)(2 m+1 + j\Sigma j) bits, where m is the number of characters (excluding operator symbols) in the regular expression and \Sigma is the alphabet. This compares favorably against the worst case of (m+1)2 m+1 j\Sigma j bits needed by a classical DFA representation and m(2 2m+1 + j\Sigma j) bits needed by the Wu and Manber approach implemented in Agrep. Our approach is practical and simple to implement, and it permits searching regular expressions of moderate size (which include most cases of interest) faster than with any previously existing algorithm, as we show experimentally. 1