Results 1 - 10
of
11
Deterministic Part-of-Speech Tagging with Finite-State Transducers
- Computational Linguistics
, 1995
"... Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state-of-the-art part-of-speech tagging can be achieved w ..."
Abstract
-
Cited by 75 (0 self)
- Add to MetaCart
Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state-of-the-art part-of-speech tagging can be achieved with a rule-based tagger by inferring rules from a training corpus. However, current implementations of the rule-based tagger run more slowly than previous approaches. In this paper, we present a finite-state tagger, inspired by the rule-based tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finite-state machine. This result is achieved by encoding the application of the rules found in the tagger as a nondeterministic finite-state transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a part-of-speech tagger whose speed is dominated by the access time of mass storage devices. We then generalize the techniques to the class of transformation-based systems. 1.
Minimization Algorithms for Sequential Transducers
, 2000
"... We present general algorithms for minimizing sequential finite-state transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+|E|+|V|+(|E|-|V|+|F|)x(|Pmax|+1)) steps, where S is the sum of ..."
Abstract
-
Cited by 47 (12 self)
- Add to MetaCart
We present general algorithms for minimizing sequential finite-state transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+|E|+|V|+(|E|-|V|+|F|)x(|Pmax|+1)) steps, where S is the sum of the lengths of all output labels of the resulting transducer, E the set of transitions of the given transducer, V the set of its states, F the set of final states, and Pmax one of the longest of the longest common prefixes of the output paths leaving each state of the transducer. The algorithms apply to a larger class of transducers which includes subsequential transducers.
Incremental Construction of Minimal Acyclic Finite State Automata and Transducers
, 1998
"... In this paper, we describe a new method for constructing mi, lmal, determin- istic, acyclic finite state automata and transducers. Traditional methods consist of two steps. The first one is to construct a trie, the second one -- to perform minimization. Our approach is to construct an automaton i ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
In this paper, we describe a new method for constructing mi, lmal, determin- istic, acyclic finite state automata and transducers. Traditional methods consist of two steps. The first one is to construct a trie, the second one -- to perform minimization. Our approach is to construct an automaton in a single step by adding new strings one by one and minjmizin the resulting automaton on-the-fly. We present a general algorithm as well as a specialization that relies upon the lexicographical sorting of the input strings.
Minimization of Sequential Transducers
- Lecture Notes in Computer Science
"... . We present an algorithm for minimizing sequential transducers. This algorithm is shown to be efficient, since in the case of acyclic transducers it operates in O(jEj + jV j + (Ej \Gamma jV j + jF j):(jPmax j + 1) steps, where E is the set of edges of the given transducer, V the set of its vertices ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
. We present an algorithm for minimizing sequential transducers. This algorithm is shown to be efficient, since in the case of acyclic transducers it operates in O(jEj + jV j + (Ej \Gamma jV j + jF j):(jPmax j + 1) steps, where E is the set of edges of the given transducer, V the set of its vertices, F the set of final states, and Pmax the longest of the greatest common prefixes of the output paths leaving each state of the transducer. It can be applied to a larger class of transducers which includes subsequential transducers. 1 Introduction Finite automata and transducers are used in many efficient programs. They allow to produce in a very easy way lexical analyzers for complex languages. In some applications as in Natural Language Processing the involved finite-state machines can contain several hundreds of thousands of states. Reducing the size of these graphs without losing their recognition properties is then crucial. This problem has been solved in the case of deterministic autom...
Direct Building of Minimal Automaton for a Given List
"... This paper presents a method for direct building of minimal acyclic finite states automaton which recognizes a given finite list of words in lexicographical order. The size of the temporary automata which are necessary for the construction is less than the size of the resulting minimal automata plus ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper presents a method for direct building of minimal acyclic finite states automaton which recognizes a given finite list of words in lexicographical order. The size of the temporary automata which are necessary for the construction is less than the size of the resulting minimal automata plus the length of one of the longest words in the list. This property is the main advantage of our method.
Experiments with Automata Compression
, 2000
"... Several compression methods of finite-state automata are presented and evaluated. Most compression methods used here are already described in the literature. However, their impact on the size of automata has not been described yet. We fill that gap, presenting results of experiments carried out on a ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Several compression methods of finite-state automata are presented and evaluated. Most compression methods used here are already described in the literature. However, their impact on the size of automata has not been described yet. We fill that gap, presenting results of experiments carried out on automata representing German, and Dutch morphological dictionaries.
Comparison of Construction Algorithms for Minimal, Acyclic, Deterministic, Finite-State Automata from Sets of Strings
"... This paper compares various methods for constructing minimal, deterministic, acyclic, nite-state automata from sets of words. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper compares various methods for constructing minimal, deterministic, acyclic, nite-state automata from sets of words.
Parsing Natural Language Idioms with Bidirectional Finite-State Machines
- 0.6070 0.1740 0.2000 R X 0.2990 0.5870 0.1140 * G = Y 0.0000 0.0660 1.1110 B Z Generally, D65 illumination
, 2001
"... In this paper, we introduce the notion of bidirectional nite-state automata (BFSA). A BFSA is dened by the following sequence: A left :!:A right , where ! is a word called pivot, A right a FSA that should be read from the left to the right and A left a FSA that should be read from the right to the l ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we introduce the notion of bidirectional nite-state automata (BFSA). A BFSA is dened by the following sequence: A left :!:A right , where ! is a word called pivot, A right a FSA that should be read from the left to the right and A left a FSA that should be read from the right to the left. ! is an edge linking the initial state of A left to the initial state of A right . We present the use of such devices for natural language processing. In this context, BFSA have to be enriched with notions of proximity, optionality and contextual information. Some concrete examples are examined. 1
Outilex, a Linguistic Platform for Text Processing
"... We present Outilex, a generalist linguistic platform for text processing. The platform includes several modules implementing the main operations for text processing and is designed to use large-coverage Language Resources. These resources (dictionaries, grammars, annotated texts) are formatted into ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present Outilex, a generalist linguistic platform for text processing. The platform includes several modules implementing the main operations for text processing and is designed to use large-coverage Language Resources. These resources (dictionaries, grammars, annotated texts) are formatted into XML, in accordance with current standards. Evaluations on efficiency are given.
Finite Automata for Compact Representation of Language Models in NLP
"... A technique for compact representation of language models in Natural Language Processing is presented. After a brief review of the motivations for a more compact representation of such language models, it is shown how finite-state automata can be used to compactly represent such language models. The ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A technique for compact representation of language models in Natural Language Processing is presented. After a brief review of the motivations for a more compact representation of such language models, it is shown how finite-state automata can be used to compactly represent such language models. The technique can be seen as an application and extension of perfect hashing by means of finite-state automata. Preliminary practical experiments indicate that the technique yields considerable and important space savings of up to 90% in practice.

