Results 1  10
of
15
FiniteState Transducers in Language and Speech Processing
 Computational Linguistics
, 1997
"... Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducer ..."
Abstract

Cited by 308 (41 self)
 Add to MetaCart
Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of stringtoweight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Partial parsing via finitestate cascades
 Natural Language Engineering
, 1996
"... Finitestate cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers specified by finitestate cascades are fast and reliable. They can be extended at modest cost to construct parse trees with finite feature structures. Finally, such deterministic parsers d ..."
Abstract

Cited by 293 (4 self)
 Add to MetaCart
Finitestate cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers specified by finitestate cascades are fast and reliable. They can be extended at modest cost to construct parse trees with finite feature structures. Finally, such deterministic parsers do not necessarily involve trading off accuracy against speed—they may in fact be more accurate than exhaustivesearch stochastic contextfree parsers. 1 FiniteState Cascades Of current interest in corpusoriented computational linguistics are techniques for bootstrapping broadcoverage parsers from text corpora. The work described here is a step along the way toward a bootstrapping scheme that involves inducing a tagger from word distributions, a lowlevel “chunk ” parser from a tagged corpus, and lexical dependencies from a chunked corpus. In particular, I describe a chunk parsing technique based on what I will call a finitestate cascade. Though I shall not address the question of inducing such a parser from a corpus, the parsing technique has been implemented and is being used in a project for inducing lexical dependencies from corpora in English and German. The resulting parsers are robust and very fast. A finitestate cascade consists of a sequence of levels. Phrases at one level are built on phrases at the previous level, and there is no recursion: phrases never contain samelevel or higherlevel phrases. Two levels of special importance are the level of chunks and the level of simplex clauses [2, 1]. Chunks are the nonrecursive cores of “major ” phrases, i.e., NP, VP, PP, AP, AdvP. Simplex clauses are clauses in which embedded clauses have been turned into siblings— tail recursion has been replaced with iteration, so to speak. To illustrate, (1) shows a parse tree represented as a sequence of levels.
The Design Principles of a Weighted FiniteState Transducer Library
 THEORETICAL COMPUTER SCIENCE
, 2000
"... We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive eff ..."
Abstract

Cited by 99 (23 self)
 Add to MetaCart
We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive efficiency in demanding speech processing applications involving weighted automata of more than 10^7 states and transitions. Besides its mathematical foundation, the design also draws from important ideas in algorithm design and programming languages: dynamic programming and shortestpaths algorithms over general semirings, objectoriented programming, lazy evaluation and memoization.
Incremental FiniteState Parsing
 In Proceedings of the Fifth Conference on Applied Natural Language Processing
, 1997
"... This paper describes a new finitestate shallow parser. It merges constructive and reductionist approaches within a highly modular architecture. Syntactic information is added at the sentence level in an incremental way, depending on the contextual information available at a given stage. This approa ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
This paper describes a new finitestate shallow parser. It merges constructive and reductionist approaches within a highly modular architecture. Syntactic information is added at the sentence level in an incremental way, depending on the contextual information available at a given stage. This approach overcomes the inefficiency of previous fully reductionist constraintbased systems, while maintaining broad coverage and linguistic granularity. The implementation relies on a sequence of networks built with the replace operator. Given the high level of modularity, the core grammar is easily augmented with corpusspecific subgrammars. The current system is implemented for French and is being expanded to new languages. 1 Background Previous work in finitestate parsing at sentence level falls into two categories: the constructive approach or the reductionist approach. The origins of the constructive approach go back to the parser developed by Joshi (Joshi, 1996). It is based on a lexical ...
FTAG: current status and parsing scheme
 Proc. Vextal '99
, 1999
"... Introduction As far as electronic syntactic resources go, one can distinguish rulebased versus statisticsbased grammars, as well as programdependent versus reusable grammars. Lexicalized Tree adjoning grammars (LTAGs) have been used to develop reusable widecoverage rulebased grammars for diffe ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Introduction As far as electronic syntactic resources go, one can distinguish rulebased versus statisticsbased grammars, as well as programdependent versus reusable grammars. Lexicalized Tree adjoning grammars (LTAGs) have been used to develop reusable widecoverage rulebased grammars for different languages (cf. Doran et al. 1994, 1998 for English, Abeill 1991 and Candito 1999 for French). We describe here the current status and organization of the French LTAG (FTAG), developed over the past 10 years. The grammar is intended to model speaker competence, and is both application and domain independent. It can be used for syntactic tagging, for parsing and for generation. We present a parsing scheme, including a POS tagger, a parser and a parse ranker. 1. Organization of the French LTAG A LTAG comprises morphological and syntactic lexicons and a vast repository of elementary trees. We present here the general principles on elementary trees, their factorizat
A Robust FiniteState Parser For French
 In ESSLLI'96 Robust Parsing Workshop
, 1997
"... This paper describes a robust finitestate parser implemented for French. The parser attaches morphosyntactic tags to each word and determines clause boundaries. It is a reductionist parser based on finitestate networks and their intersection. We describe essential elements of the rule writing sys ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper describes a robust finitestate parser implemented for French. The parser attaches morphosyntactic tags to each word and determines clause boundaries. It is a reductionist parser based on finitestate networks and their intersection. We describe essential elements of the rule writing system, and show how it is actually applied to solve various phenomena, such as argument uniqueness, agreement or apposition. We show some results which indicate that the parser can parse technical manuals with high accuracy (in a test sample 95 % of partofspeech and functional tags were correct). The average number of parses per sentence is very low, more than 92 % of sentences produce less than 4 parses, including the correct one. A test on very long sentences from newspaper corpora and a discussion of errors provide more insight into the parser. 1 Introduction We introduce a parser that uses finitestate networks from the tokenisation of the text to syntactic analysis. See [1, 12, 14, 16,...
ContextFree Parsing With FiniteState Transducers
 In String Processing Colloquium
, 1996
"... This article is a study of an algorithm designed and implemented by Roche for parsing natural language sentences according to a contextfree grammar. This algorithm is based on the construction and use of a finitestate transducer. Roche successfully applied it to a contextfree grammar with very nu ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This article is a study of an algorithm designed and implemented by Roche for parsing natural language sentences according to a contextfree grammar. This algorithm is based on the construction and use of a finitestate transducer. Roche successfully applied it to a contextfree grammar with very numerous rules. We explain why a contextfree grammar with a correct lexical and grammatical coverage is bound to have a very large number of rules. We exemplify the principle of the algorithm and provide a formal specification. We prove that the parser can be built for a large class of contextfree grammars, and that it outputs the set of parsing trees of the input sequence.
Local Grammars for the Description of MultiWord Lexemes and their Automatic Recognition in Texts
, 1996
"... Most multiword lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description to be able to recognize them in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same ti ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Most multiword lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description to be able to recognize them in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same time permit to express regularities valid for a whole class of MWLs such as word order variation in German. The local grammars can be written in a very convenient and compact way as regular expressions in the formalism IDAREX which uses a twolevel morphology. IDAREX allows to define various types of variables, and to mix canonical and inflected word forms in the regular expressions. The finite state based dictionary lookup system locolex/compass uses such local grammars to recognize MWLs in English, German and French online texts. 1
Use of weighted finite state transducers in part of speech tagging
, 1997
"... This paper addresses issues in part of speech disambiguation using finitestate transducers and presents two main contributions to the field. One of them is the use of finitestate machines for part of speech tagging. Linguistic and statistical information is represented in terms of weights on trans ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
This paper addresses issues in part of speech disambiguation using finitestate transducers and presents two main contributions to the field. One of them is the use of finitestate machines for part of speech tagging. Linguistic and statistical information is represented in terms of weights on transitions in weighted finitestate transducers. Another contribution is the successful combination of techniques – linguistic and statistical – for word disambiguation, compounded with the notion of word classes.
IDAREX: Formal Description of MultiWord Lexemes with Regular Expressions
, 1995
"... Most multiword lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description and their recognition in texts. ..."
Abstract
 Add to MetaCart
Most multiword lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description and their recognition in texts.