Results 1 - 10
of
101
The Proposition Bank: An Annotated Corpus of Semantic Roles
- Computational Linguistics
, 2005
"... The Proposition Bank project takes a practical approach to semantic representation, adding a layer of predicate-argument information, or semantic role labels, to the syntactic structures of the Penn Treebank. The resulting resource can be thought of as shallow, in that it does not represent corefere ..."
Abstract
-
Cited by 256 (8 self)
- Add to MetaCart
The Proposition Bank project takes a practical approach to semantic representation, adding a layer of predicate-argument information, or semantic role labels, to the syntactic structures of the Penn Treebank. The resulting resource can be thought of as shallow, in that it does not represent coreference, quantification, and many other higher-order phenomena, but also broad, in that it covers every instance of every verb in the corpus and allows representative statistics to be calculated. We discuss the criteria used to define the sets of semantic roles used in the annotation process and to analyze the frequency of syntactic/semantic alternations in the corpus. We describe an automatic system for semantic role tagging trained on the corpus and discuss the effect on its performance of various types of information, including a comparison of full syntactic parsing with a flat representation and the contribution of the empty ‘‘trace’ ’ categories of the treebank.
Robust Accurate Statistical Annotation of General Text
, 2002
"... We describe a robust accurate domain-independent approach to statistical parsing incorporated into the new release of the ANLT toolkit, and publicly available as a research tool. The system has been used to parse many well known corpora in order to produce data for lexical acquisition efforts; it ha ..."
Abstract
-
Cited by 146 (11 self)
- Add to MetaCart
We describe a robust accurate domain-independent approach to statistical parsing incorporated into the new release of the ANLT toolkit, and publicly available as a research tool. The system has been used to parse many well known corpora in order to produce data for lexical acquisition efforts; it has also been used as a component in an open-domain question answering project. The performance of the system is competitive with that of statistical parsers using highly lexicalised parse selection models. However, we plan to extend the system to improve parse coverage, depth and accuracy.
Wide-coverage efficient statistical parsing with CCG and log-linear models
- COMPUTATIONAL LINGUISTICS
, 2007
"... This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ ..."
Abstract
-
Cited by 87 (20 self)
- Add to MetaCart
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Alpino: Wide-coverage Computational Analysis of Dutch
- In
, 2000
"... Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing ..."
Abstract
-
Cited by 55 (10 self)
- Add to MetaCart
Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing a reasonably abstract and theory-neutral level of linguistic representation. An important aspect of wide-coverage parsing is robustness and disambiguation. The dependency relations encoded in the dependency structures have been used to develop and evaluate both hand-coded and statistical disambiguation methods.
Corpus Annotation for Parser Evaluation
- In Proceedings of the EACL workshop on Linguistically Interpreted Corpora (LINC
, 1999
"... We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show ho ..."
Abstract
-
Cited by 50 (5 self)
- Add to MetaCart
We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources. 1 Introduction The evaluation of individual language-processing components forming part of larger-scale natural language processing (NLP) application systems has recently emerged as an important area of research (see e.g. Rubio, 1998; Gaizauskas, 1998). A syntactic parser is often a component of an NLP system; a reliable technique for comparing and assessing the relative strengths and weaknesses of different parsers (or indeed of different versions of the same parser during development) is therefore a necessity. Current methods for evaluating the accuracy of syntactic parsers are...
Identifying Semantic Roles Using Combinatory Categorial Grammar
, 2003
"... We present a system for automatically identifying PropBank-style semantic roles based on the output of a statistical parser for Combinatory Categorial Grammar. ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
We present a system for automatically identifying PropBank-style semantic roles based on the output of a statistical parser for Combinatory Categorial Grammar.
Can Subcategorisation Probabilities Help a Statistical Parser?
- In Proceedings of the 6th ACL/SIGDAT Workshop on Very Large Corpora
, 1998
"... Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a widecoverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy 1 .
High Precision Extraction of Grammatical Relations
, 2002
"... A parsing system returning analyses in the form of sets of grammatical relations can obtain high precision if it hypothesises a particular relation only when it is certain that the relation is correct. We operationalise this technique---in a statistical parser using a manually-developed wide-coverag ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
A parsing system returning analyses in the form of sets of grammatical relations can obtain high precision if it hypothesises a particular relation only when it is certain that the relation is correct. We operationalise this technique---in a statistical parser using a manually-developed wide-coverage grammar of English---by only returning relations that form part of all analyses licensed by the grammar. We observe an increase in precision from 75% to over 90% (at the cost of a reduction in recall) on a test corpus of naturally-occurring text.
Detecting a Continuum of Compositionality in Phrasal Verbs
- IN PROCEEDINGS OF THE ACL-SIGLEX WORKSHOP ON MULTIWORD EXPRESSIONS: ANALYSIS, ACQUISITION AND TREATMENT
, 2003
"... We investigate the use of an automatically acquired thesaurus for measures designed to indicate the compositionality of candidate multiword verbs, specifically English phrasal verbs identified automatically using a robust parser. We examine various measures using the nearest neighbours of the ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
We investigate the use of an automatically acquired thesaurus for measures designed to indicate the compositionality of candidate multiword verbs, specifically English phrasal verbs identified automatically using a robust parser. We examine various measures using the nearest neighbours of the phrasal verb, and in some cases the neighbours of the simplex counterpart and show that some of these correlate significantly with human rankings of compositionality on the test set. We also show that whilst the compositionality judgements correlate with some statistics commonly used for extracting multiwords, the relationship is not as strong as that using the automatically constructed thesaurus.

