Results 1 - 10
of
42
Automatic extraction of subcategorization from corpora
- In Proceedings of the 5th ACL Conference on Applied Natural Language Processing
, 1997
"... We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verb ..."
Abstract
-
Cited by 176 (7 self)
- Add to MetaCart
We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount 1. 1
XTAG System -- A Wide Coverage Grammar for English
, 1994
"... This paper presents the XTAG system, a grammar development tool based on the Tree Actioining Grammar (IG) formalism that includes a wide-coverage syntactic grammar' or English. The various components of the system are discussed and preliminary evaluation results fi'om the parsing of various cor ..."
Abstract
-
Cited by 68 (17 self)
- Add to MetaCart
This paper presents the XTAG system, a grammar development tool based on the Tree Actioining Grammar (IG) formalism that includes a wide-coverage syntactic grammar' or English. The various components of the system are discussed and preliminary evaluation results fi'om the parsing of various corpora are given. Results from the comparison of XTAG agaiust the IBM statistical parser' and the Alvey Natural Language Tool parser are also given.
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Developing and evaluating a probabilistic LR parser of part-of-speech and punctuation labels
- In Proceedings of the 4th ACL/SIGPARSE International Workshop on Parsing Technologies
, 1995
"... We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the cover ..."
Abstract
-
Cited by 52 (9 self)
- Add to MetaCart
We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks. 1
GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language
, 1996
"... The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disflu ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. The contamination of the input with errors of a speech recognizer can further exacerbate these problems. Most natural language parsing algorithms are designed to analyze "clean" grammatical input. Because they reject any input which is found to be ungrammatical in even the slightest way, such parsers are unsuitable for parsing spontaneous speech, where completely grammatical input is the exception more than the rule. This thesis describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise...
Robust Stochastic Parsing Using the Inside-Outside Algorithm
, 1992
"... this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough to be useful and deals effectively with the problem of undergeneration. We focus on the application of these stochastic algorithms here because, although other statistically based approaches have been proposed (e.g. Sampson et al., 1989; Garside & Leech, 1985; Magerman & Marcus, 1991a,b), these appear most promising as they are computationally-tractable (in principle) and well-integrated with formal language / automata theory. The Viterbi algorithm and Baum-Welch algorithm are optimised algorithms (with polynomial computational complexity) which can be used in conjunction with stochastic regular grammars (finite-state automata, i.e. (hidden) markov models, Baum, 1972) and with probabilistic context-free grammars (Baker, 1982; Fujisaki
Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation
- UNIVERSITY OF PENNSYLVANIA
, 1996
"... We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the system as a whole, and thus prioririse the effort to be devoted to its further enhancement. Currently, the system is able to parse around 80% of sentences in a substantial corpus of general text containing a number of distinct genres. On a random sample of 250 such sentences the system has a mean crossing bracket rate of 0.71 and recall and precision of 83% and 84% respectively when evaluated against manually-disambiguated analyses.
Relating complexity to practical performance in parsing with wide-coverage unification grammars
, 1994
"... The paper demonstrates that exponential complexities with respect to grammar size and input length have little impact on the performance of three unification-based parsing algorithms, using a wide-coverage grammar. The results imply that tile study and optimisation of unification-based parsing must ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
The paper demonstrates that exponential complexities with respect to grammar size and input length have little impact on the performance of three unification-based parsing algorithms, using a wide-coverage grammar. The results imply that tile study and optimisation of unification-based parsing must rely on empirical data until complexity theory can more accurately predict the practical behaviour of such parsers.
Regular Approximation Of Context-Free Grammars Through Transformation
, 2000
"... We present an algorithm for approximating context-free languages with regular languages. The algorithm is based on a simple transformation that applies to any context-free grammar and guarantees that the result can be compiled into a finite automaton. The resulting grammar contains at most one new n ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
We present an algorithm for approximating context-free languages with regular languages. The algorithm is based on a simple transformation that applies to any context-free grammar and guarantees that the result can be compiled into a finite automaton. The resulting grammar contains at most one new nonterminal for any nonterminal symbol of the input grammar. The result thus remains readable and if necessary modifiable. We extend the approximation algorithm to the case of weighted context-free grammars. We also report experiments with several grammars showing that the size of the minimal deterministic automata accepting the resulting approximations is of practical use for applications such as speech recognition. 9.1 Introduction Despite the availability of extensive literature on the topic of efficient contextfree parsing, for large and very ambiguous grammars, context-free parsing poses a serious problem in many practical applications such as real-time speech recognition. For most gra...
Balancing Robustness and Efficiency in Unification-augmented Context-Free Parsers for Large Practical Applications
- Robustness in Language and Speech Technology
"... Large practical NLP applications require robust analysis components that can effectively handle input that is disfluent or extra-grammatical. The effectiveness and efficiency of any robust parser are a direct function of three main factors: (1) Flexibility: what types of disfluencies and deviations ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
Large practical NLP applications require robust analysis components that can effectively handle input that is disfluent or extra-grammatical. The effectiveness and efficiency of any robust parser are a direct function of three main factors: (1) Flexibility: what types of disfluencies and deviations from the grammar can the parser handle?; (2) Search: How does the parser search the space of possible interpretations, and what techniques are applied to prune the search space?; and (3) Parse Selection and Disambiguation: What methods and resources are used to evaluate and rank potential parses and sub-parses, and how does the parser cope with the extreme levels of ambiguity introduced by its flexibility parameters? In this chapter we describe our investigations on how to balance flexibility and efficiency in the context of two different robust parsers - a GLR parser and a left corner Chart parser - both based on a unification-augmented context-free grammar formalism. We demonstrate how the...

