Results 1 - 10
of
28
Part-of-Speech Tagging and Partial Parsing
- Corpus-Based Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but non-zero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of hand-constructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then hand-edited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
Figures of Merit for Best-First Probabilistic Chart Parsing
- Computational Linguistics
, 1996
"... Best-first parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While ..."
Abstract
-
Cited by 65 (3 self)
- Add to MetaCart
Best-first parsing methods for natural language try to parse efficiently by considering the most likely constituents first. Some figure of merit is needed by which to compare the likelihood of constituents, and the choice of this figure has a substantial impact on the efficiency of the parser. While several parsers described in the literature have used such techniques, there is no published data on their efficacy, much less attempts to judge their relative merits. We propose and evaluate several figures of merit for best-first parsing.
A* Parsing: Fast Exact Viterbi Parse Selection
- IN PROCEEDINGS OF THE HUMAN LANGUAGE TECHNOLOGY CONFERENCE AND THE NORTH AMERICAN ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (HLT-NAACL
, 2003
"... We present an extension of the classic A* search procedure to tabular PCFG parsing. The use of A* search can dramatically reduce the time required to find a best parse by conservatively estimating the probabilities of parse completions. We discuss various estimates and give efficient algorithms ..."
Abstract
-
Cited by 65 (3 self)
- Add to MetaCart
We present an extension of the classic A* search procedure to tabular PCFG parsing. The use of A* search can dramatically reduce the time required to find a best parse by conservatively estimating the probabilities of parse completions. We discuss various estimates and give efficient algorithms for computing them. On average-length Penn treebank sentences, our most detailed estimate reduces the total number of edges processed to less than 3% of that required by exhaustive parsing, and a simpler estimate, which requires less than a minute of precomputation, reduces the work to less than 5%. Unlike best-first and finite-beam methods for achieving this kind of speed-up, an A* method is guaranteed to find the most likely parse, not just an approximation. Our parser
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Assigning Function Tags to Parsed Text
, 2000
"... It is generally recognized that the common nonterminal labels for syntactic constituents (NP, VP, etc.) do not exhaust the syntactic and semantic information one would like about parts of a syntactic tree. For example, the Penn Treebank gives each constituent zero or more 'function tags' indicating ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
It is generally recognized that the common nonterminal labels for syntactic constituents (NP, VP, etc.) do not exhaust the syntactic and semantic information one would like about parts of a syntactic tree. For example, the Penn Treebank gives each constituent zero or more 'function tags' indicating semantic roles and other related information not easily encapsulated in the simple constituent labels. We present a statistical algorithm for assigning these function tags that, on text already parsed to a simplelabel level, achieves an F-measure of 87%, which rises to 99% when considering 'no tag' as a valid choice.
Practical Unification-based Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is well-suited to practical NL parsing, and describes a bottom-up active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
Edge-Based Best-First Chart Parsing
- IN PROCEEDINGS OF THE SIXTH WORKSHOP ON VERY LARGE CORPORA
, 1998
"... Best-first probabilistic chart parsing attempts to parse efficiently by working on edges that are judged 'best' by some probabilistic figure of merit (FOM). Recent work has used proba- bilistic context-free grammars (PCFGs) to sign probabilities to constituents, and to use these probabilities as the ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Best-first probabilistic chart parsing attempts to parse efficiently by working on edges that are judged 'best' by some probabilistic figure of merit (FOM). Recent work has used proba- bilistic context-free grammars (PCFGs) to sign probabilities to constituents, and to use these probabilities as the starting point for the FOM. This paper extends this approach to us- ing a probabilistic FOM to judge edges (incomplete constituents), thereby giving a much finergrained control over parsing effort. We show how this can be accomplished in a particularly simple way using the common idea of binarizing the PCFG. The results obtained are about a factor of twenty improvement over the best prior results -- that is, our parser achieves equivalent results using one twentieth the number of edges. Furthermore we show that this improvement is obtained with parsing precision and recall levels superior to those achieved by exhaustive parsing.
Efficiency, Robustness and Accuracy in Picky Chart Parsing
- UNIVERSITY OF DELAWARE
, 1992
"... This paper describes Picky, a probabilistic agenda-based chart parsing algorithm which uses a technique called probabilistic prediction to predict which grammar rules are likely to lead to an acceptable parse of the input. Using a suboptimal search method, Picky significantly reduces the number of e ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
This paper describes Picky, a probabilistic agenda-based chart parsing algorithm which uses a technique called probabilistic prediction to predict which grammar rules are likely to lead to an acceptable parse of the input. Using a suboptimal search method, Picky significantly reduces the number of edges produced by CKY-like chart parsing algorithms, while maintaining the robustness of pure bottom-up parsers and the accuracy of existing probabilistic parsers. Experiments using Picky demonstrate how probabilistic modelling can impact upon the efficiency, robustness and accuracy of a parser.

