Results 1  10
of
28
Deterministic PartofSpeech Tagging with FiniteState Transducers
 Computational Linguistics
, 1995
"... Stochastic approaches to natural language processing have often been preferred to rulebased approaches because of their robustness and their automatic training capabilities. This was the case for partofspeech tagging until Brill showed how stateoftheart partofspeech tagging can be achieved w ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
Stochastic approaches to natural language processing have often been preferred to rulebased approaches because of their robustness and their automatic training capabilities. This was the case for partofspeech tagging until Brill showed how stateoftheart partofspeech tagging can be achieved with a rulebased tagger by inferring rules from a training corpus. However, current implementations of the rulebased tagger run more slowly than previous approaches. In this paper, we present a finitestate tagger, inspired by the rulebased tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finitestate machine. This result is achieved by encoding the application of the rules found in the tagger as a nondeterministic finitestate transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a partofspeech tagger whose speed is dominated by the access time of mass storage devices. We then generalize the techniques to the class of transformationbased systems. 1.
Minimization Algorithms for Sequential Transducers
, 2000
"... We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of ..."
Abstract

Cited by 58 (12 self)
 Add to MetaCart
We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of the lengths of all output labels of the resulting transducer, E the set of transitions of the given transducer, V the set of its states, F the set of final states, and Pmax one of the longest of the longest common prefixes of the output paths leaving each state of the transducer. The algorithms apply to a larger class of transducers which includes subsequential transducers.
Incremental Construction of Minimal Acyclic Finite State Automata and Transducers
, 1998
"... In this paper, we describe a new method for constructing mi, lmal, determin istic, acyclic finite state automata and transducers. Traditional methods consist of two steps. The first one is to construct a trie, the second one  to perform minimization. Our approach is to construct an automaton i ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we describe a new method for constructing mi, lmal, determin istic, acyclic finite state automata and transducers. Traditional methods consist of two steps. The first one is to construct a trie, the second one  to perform minimization. Our approach is to construct an automaton in a single step by adding new strings one by one and minjmizin the resulting automaton onthefly. We present a general algorithm as well as a specialization that relies upon the lexicographical sorting of the input strings.
Minimization of Sequential Transducers
 Lecture Notes in Computer Science
"... . We present an algorithm for minimizing sequential transducers. This algorithm is shown to be efficient, since in the case of acyclic transducers it operates in O(jEj + jV j + (Ej \Gamma jV j + jF j):(jPmax j + 1) steps, where E is the set of edges of the given transducer, V the set of its vertices ..."
Abstract

Cited by 23 (9 self)
 Add to MetaCart
(Show Context)
. We present an algorithm for minimizing sequential transducers. This algorithm is shown to be efficient, since in the case of acyclic transducers it operates in O(jEj + jV j + (Ej \Gamma jV j + jF j):(jPmax j + 1) steps, where E is the set of edges of the given transducer, V the set of its vertices, F the set of final states, and Pmax the longest of the greatest common prefixes of the output paths leaving each state of the transducer. It can be applied to a larger class of transducers which includes subsequential transducers. 1 Introduction Finite automata and transducers are used in many efficient programs. They allow to produce in a very easy way lexical analyzers for complex languages. In some applications as in Natural Language Processing the involved finitestate machines can contain several hundreds of thousands of states. Reducing the size of these graphs without losing their recognition properties is then crucial. This problem has been solved in the case of deterministic autom...
Direct Building of Minimal Automaton for a Given List
"... This paper presents a method for direct building of minimal acyclic finite states automaton which recognizes a given finite list of words in lexicographical order. The size of the temporary automata which are necessary for the construction is less than the size of the resulting minimal automata plus ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
This paper presents a method for direct building of minimal acyclic finite states automaton which recognizes a given finite list of words in lexicographical order. The size of the temporary automata which are necessary for the construction is less than the size of the resulting minimal automata plus the length of one of the longest words in the list. This property is the main advantage of our method.
Comparison of Construction Algorithms for Minimal, Acyclic, Deterministic, FiniteState Automata from Sets of Strings
"... This paper compares various methods for constructing minimal, deterministic, acyclic, nitestate automata from sets of words. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This paper compares various methods for constructing minimal, deterministic, acyclic, nitestate automata from sets of words.
Experiments with Automata Compression
, 2000
"... Several compression methods of finitestate automata are presented and evaluated. Most compression methods used here are already described in the literature. However, their impact on the size of automata has not been described yet. We fill that gap, presenting results of experiments carried out on a ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Several compression methods of finitestate automata are presented and evaluated. Most compression methods used here are already described in the literature. However, their impact on the size of automata has not been described yet. We fill that gap, presenting results of experiments carried out on automata representing German, and Dutch morphological dictionaries.
Finite Automata for Compact Representation of Language Models in NLP
"... A technique for compact representation of language models in Natural Language Processing is presented. After a brief review of the motivations for a more compact representation of such language models, it is shown how finitestate automata can be used to compactly represent such language models. The ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A technique for compact representation of language models in Natural Language Processing is presented. After a brief review of the motivations for a more compact representation of such language models, it is shown how finitestate automata can be used to compactly represent such language models. The technique can be seen as an application and extension of perfect hashing by means of finitestate automata. Preliminary practical experiments indicate that the technique yields considerable and important space savings of up to 90% in practice.
Outilex, a Linguistic Platform for Text Processing
"... We present Outilex, a generalist linguistic platform for text processing. The platform includes several modules implementing the main operations for text processing and is designed to use largecoverage Language Resources. These resources (dictionaries, grammars, annotated texts) are formatted into ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present Outilex, a generalist linguistic platform for text processing. The platform includes several modules implementing the main operations for text processing and is designed to use largecoverage Language Resources. These resources (dictionaries, grammars, annotated texts) are formatted into XML, in accordance with current standards. Evaluations on efficiency are given.
Parsing Natural Language Idioms with Bidirectional FiniteState Machines
 0.6070 0.1740 0.2000 R X 0.2990 0.5870 0.1140 * G = Y 0.0000 0.0660 1.1110 B Z Generally, D65 illumination
, 2001
"... In this paper, we introduce the notion of bidirectional nitestate automata (BFSA). A BFSA is dened by the following sequence: A left :!:A right , where ! is a word called pivot, A right a FSA that should be read from the left to the right and A left a FSA that should be read from the right to the l ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we introduce the notion of bidirectional nitestate automata (BFSA). A BFSA is dened by the following sequence: A left :!:A right , where ! is a word called pivot, A right a FSA that should be read from the left to the right and A left a FSA that should be read from the right to the left. ! is an edge linking the initial state of A left to the initial state of A right . We present the use of such devices for natural language processing. In this context, BFSA have to be enriched with notions of proximity, optionality and contextual information. Some concrete examples are examined. 1