Results 11  20
of
68
Compact DFA Representation for Fast Regular Expression Search
, 2001
"... . We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m + 1)(2 m+1 + j\Sigma j) bits, where m is the number of charact ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
. We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m + 1)(2 m+1 + j\Sigma j) bits, where m is the number of characters (excluding operator symbols) in the regular expression and \Sigma is the alphabet. This compares favorably against the worst case of (m+1)2 m+1 j\Sigma j bits needed by a classical DFA representation and m(2 2m+1 + j\Sigma j) bits needed by the Wu and Manber approach implemented in Agrep. Our approach is practical and simple to implement, and it permits searching regular expressions of moderate size (which include most cases of interest) faster than with any previously existing algorithm, as we show experimentally. 1
Brzozowski’s algorithm (co)algebraically
"... Abstract. We give a new presentation of Brzozowski’s algorithm to minimize finite automata, using elementary facts from universal algebra and coalgebra, and building on earlier work by Arbib and Manes on the duality between reachability and observability. This leads to a simple proof of its correctn ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We give a new presentation of Brzozowski’s algorithm to minimize finite automata, using elementary facts from universal algebra and coalgebra, and building on earlier work by Arbib and Manes on the duality between reachability and observability. This leads to a simple proof of its correctness and opens the door to further generalizations. 1
Typographical nearestneighbor search in a finitestate lexicon and its application to spelling correction
 Lecture Notes in Computer Science
, 2001
"... Abstract. A method of errortolerant lookup in a finitestate lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazer’s algorithm searches for all possible corrections of a misspelled word that are wi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract. A method of errortolerant lookup in a finitestate lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazer’s algorithm searches for all possible corrections of a misspelled word that are within a given similarity threshold, our approach is to retain only the most similar corrections (nearest neighbours), reducing dynamically the search space in the lexicon, and to reach the first correction as soon as possible. 1
New techniques for regular expression searching
 Algorithmica
, 2005
"... We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2 m) bits, ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2 m) bits, where m is the number of characters, excluding operator symbols, in the regular expression. This compares favorably against the worst case of O(m2 m Σ) bits needed by a classical DFA representation (where Σ is the alphabet) and O(m2 2m) bits needed by the Wu and Manber approach implemented in Agrep. We also present a new way to search for regular expressions, which is able to skip text characters. The idea is to determine the minimum length ℓ of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ℓ of the strings originally accepted, and use it to skip text characters as done for exact string matching in previous work. We combine these techniques into two algorithms, one able and one unable to skip text characters. The algorithms are simple to implement, and our experiments show that they permit fast searching for regular expressions, normally faster than any existing algorithm. 1
A Taxonomy of Algorithms for Constructing Minimal Acyclic Deterministic Finite Automata
 Proc. Workshop on Implementing Automata
, 1999
"... this paper, we present a taxonomy of algorithms for constructing minimal acyclic deterministic finite automata (MADFAs). MADFAs represent finite languages and are therefore useful in applications such as storing words for spellchecking, computer and biological virus searching, text indexing and XML ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
this paper, we present a taxonomy of algorithms for constructing minimal acyclic deterministic finite automata (MADFAs). MADFAs represent finite languages and are therefore useful in applications such as storing words for spellchecking, computer and biological virus searching, text indexing and XML tag lookup. In such applications, the automata can grow extremely large (with more than 10
Implementing WS1S via Finite Automata
"... It has long been known that WS1S is decidable through the use of finite automata. However, since the worst case running time has been proven to grow extremely quickly, few have explored the implementation of the algorithm. In this paper we describe some of the points of interest that have come up wh ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
It has long been known that WS1S is decidable through the use of finite automata. However, since the worst case running time has been proven to grow extremely quickly, few have explored the implementation of the algorithm. In this paper we describe some of the points of interest that have come up while coding and running the algorithm. These points include the data structures used as wekk as the special properties of the automata, which we can exploit to perform minimization very quickly in certain cases. We also present some data that enable us to gain insight into how the algorithm performs in the average case, both on random inputs ans on inputs that come from the use of Presburger Arithmetic (which can be converted to WS1S) in compiler optimization.
Streching and Jamming of Automata
 In Proceedings of the 2003 annual conference of the South African institute of
, 2003
"... monologue in the field of automata theory. In this thesis we present two new transformation operations on finite state automata, called stretching and jamming. These transformations are intended to increase the performance of the automata. Readers who are mainly interested in the theoretical side of ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
monologue in the field of automata theory. In this thesis we present two new transformation operations on finite state automata, called stretching and jamming. These transformations are intended to increase the performance of the automata. Readers who are mainly interested in the theoretical side of the transformations are referred to chapters 2 and 3. An overview of the abstract algorithms that model the transformations is given in chapters 4 and 5. Implementation details can be found in chapter 6. If the reader is interested in the practical results, and wants to know in which cases stretching and jamming are useful, we refer to chapter 7. I had the pleasure to work on this thesis at the University of Pretoria in South Africa. I would like to thank Prof. Dr. Derrick Kourie from the University of Pretoria for his supervision and inspiration during my stay in South Africa. I also would like to thank Prof. Dr. Bruce Watson for his supervision and inspiration. Lastly, I want to thank Dr. Ir. Alex Telea for his advice and Ir. Loek Cleophas who helped me with several revisions of this thesis. i ii
Towards SPARE Time: A New Taxonomy and Toolkit of Keyword Pattern Matching Algorithms
, 2003
"... We present a new taxonomy and toolkit of keyword pattern matching algorithms. The new taxonomy is an extension of a prior taxonomy of such algorithms. It includes a number of algorithms (including factor and factor oraclebased and bitparallel prefixbased pattern matching algorithms) that have be ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
We present a new taxonomy and toolkit of keyword pattern matching algorithms. The new taxonomy is an extension of a prior taxonomy of such algorithms. It includes a number of algorithms (including factor and factor oraclebased and bitparallel prefixbased pattern matching algorithms) that have been published or received a lot of attention in the last decade. Based on the new taxonomy, we developed a pattern matching toolkit. This toolkit is a revision and extension of the SPARE Parts toolkit that had been developed based on the original taxonomy. We present the architecture of the new toolkit, which is named SPARE Time. Samenvatting We presenteren een nieuwe taxonomie en toolkit van algorithmen voor keyword pattern matching. De nieuwe taxonomie vormt een uitbreiding van een eerdere taxonomie van zulke algorithmen. Ze bevat een aantal algorithmen (waaronder algorithmen gebaseerd op factoren en factor oracles en bitparallelle algorithmen gebaseerd op prefixen) die in de afgelopen tien jaar gepubliceerd zijn of veel aandacht gekregen hebben.
Current Issues in Software Engineering for Natural Language Processing
 Proc. of the Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), the Joint Conf. for Human Language Technology and the Annual Meeting of the Noth American Chapter of the Association for Computational Linguistics (HLT
, 2003
"... In Natural Language Processing (NLP), research results from software engineering and software technology have often been neglected. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
In Natural Language Processing (NLP), research results from software engineering and software technology have often been neglected.
On the performance of automata minimization algorithms
 DCC  FC & LIACC, UNIVERSIDADE DO PORTO
, 2007
"... Apart from the theoretical worstcase running time analysis not much is known about the averagecase analysis or practical performance of finite automata minimization algorithms. On this paper we compare the running time of four minimization algorithms based on experimental results. We applied thes ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Apart from the theoretical worstcase running time analysis not much is known about the averagecase analysis or practical performance of finite automata minimization algorithms. On this paper we compare the running time of four minimization algorithms based on experimental results. We applied these algorithms to both deterministic and nondeterministic random automata.