Results 1 - 10
of
36
Wide-coverage efficient statistical parsing with CCG and log-linear models
- COMPUTATIONAL LINGUISTICS
, 2007
"... This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ ..."
Abstract
-
Cited by 87 (20 self)
- Add to MetaCart
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
Recognizing textual entailment with lcc’s groundhog system
- In Proc. of the Second PASCAL Challenges Workshop
, 2005
"... We introduce a new system for recognizing textual entailment (known as GROUNDHOG) which utilizes a classification-based approach to combine lexico-semantic information derived from text processing applications with a large collection of paraphrases acquired automatically from the WWW. Trained on 200 ..."
Abstract
-
Cited by 44 (7 self)
- Add to MetaCart
We introduce a new system for recognizing textual entailment (known as GROUNDHOG) which utilizes a classification-based approach to combine lexico-semantic information derived from text processing applications with a large collection of paraphrases acquired automatically from the WWW. Trained on 200,000 examples of textual entailment extracted from newswire corpora, our system managed to classify more than 75 % of the pairs in the 2006 PASCAL RTE Test Set correctly. 1
Semeval-2007 task-17: English lexical sample, SRL and all words
- In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007
, 2007
"... This paper describes our experience in preparing the data and evaluating the results for three subtasks of SemEval-2007 Task-17 – Lexical Sample, Semantic Role Labeling (SRL) and All-Words respectively. We tabulate and analyze the results of participating systems. 1 ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
This paper describes our experience in preparing the data and evaluating the results for three subtasks of SemEval-2007 Task-17 – Lexical Sample, Semantic Role Labeling (SRL) and All-Words respectively. We tabulate and analyze the results of participating systems. 1
Combining lexical resources: Mapping between propbank and verbnet
- In Proceedings of the 7th International Workshop on Computational Linguistics
, 2007
"... A wide variety of lexical resources have been created to allow automatic semantic processing of novel text. However, each resource has its own practical and theoretical idiosyncracies, making it difficult to combine the information from different resources. We discuss the form that these differences ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
A wide variety of lexical resources have been created to allow automatic semantic processing of novel text. However, each resource has its own practical and theoretical idiosyncracies, making it difficult to combine the information from different resources. We discuss the form that these differences can take, and describe how we overcame some of them in creating a mapping between two important resources: Prop-Bank and VerbNet. Furthermore, we present experimental results that show that this mapping improves performance for PropBank-style semantic role labeling. Since PropBank was designed on a verb-by-verb basis, the argument labels Arg2- Arg5 get used for a wide variety of argument roles. As a result, it can be difficult for automatic classifiers to learn to distinguish these arguments. But by using the mapping that we have created between PropBank and VerbNet, we can train a classifier based on VerbNet argument labels, which are more consistent and therefore easier to learn. 1
Towards Broad Coverage Surface Realization with CCG
- In Proceedings of the Workshop on Using Corpora for NLG: Language Generation and Machine Translation (UCNLG+MT
, 2007
"... This paper reports on progress towards developing the first broad coverage English surface realizer for Combinatory Categorial Grammar (CCG). The paper provides initial automatic evaluation results which are roughly comparable to those reported with other formalisms when using a (nonblind) grammar d ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper reports on progress towards developing the first broad coverage English surface realizer for Combinatory Categorial Grammar (CCG). The paper provides initial automatic evaluation results which are roughly comparable to those reported with other formalisms when using a (nonblind) grammar derived from the development section of the CCGbank; the results are worse, though still respectable, when using the standard dev/train/test splits, highlighting the need for better lexical smoothing and more focused search. The paper also shows that factored language models that interpolate word-level n-grams with n-grams over POS tags and supertags provide similar absolute performance improvements over word-level n-grams as have been observed with parsing-inspired log-linear models. 1
Hypertagging: Supertagging for Surface Realization with CCG
"... In lexicalized grammatical formalisms, it is possible to separate lexical category assignment from the combinatory processes that make use of such categories, such as parsing and realization. We adapt techniques from supertagging — a relatively recent technique that performs complex lexical tagging ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In lexicalized grammatical formalisms, it is possible to separate lexical category assignment from the combinatory processes that make use of such categories, such as parsing and realization. We adapt techniques from supertagging — a relatively recent technique that performs complex lexical tagging before full parsing (Bangalore and Joshi, 1999; Clark, 2002) — for chart realization in OpenCCG, an open-source NLP toolkit for CCG. We call this approach hypertagging, as it operates at a level “above ” the syntax, tagging semantic representations with syntactic lexical categories. Our results demonstrate that a hypertagger-informed chart realizer can achieve substantial improvements in realization speed (being approximately twice as fast) with superior realization quality.
Unsupervised Argument Identification for Semantic Role Labeling
"... The task of Semantic Role Labeling (SRL) is often divided into two sub-tasks: verb argument identification, and argument classification. Current SRL algorithms show lower results on the identification sub-task. Moreover, most SRL algorithms are supervised, relying on large amounts of manually create ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
The task of Semantic Role Labeling (SRL) is often divided into two sub-tasks: verb argument identification, and argument classification. Current SRL algorithms show lower results on the identification sub-task. Moreover, most SRL algorithms are supervised, relying on large amounts of manually created data. In this paper we present an unsupervised algorithm for identifying verb arguments, where the only type of annotation required is POS tagging. The algorithm makes use of a fully unsupervised syntactic parser, using its output in order to detect clauses and gather candidate argument collocation statistics. We evaluate our algorithm on PropBank10, achieving a precision of 56%, as opposed to 47 % of a strong baseline. We also obtain an 8 % increase in precision for a Spanish corpus. This is the first paper that tackles unsupervised verb argument identification without using manually encoded rules or extensive lexical or syntactic resources. 1
Investigating the Characteristics of Causal Relations in Japanese Text
- In Annual Meeting of the Association for Computational Linguistics (ACL) Workshop on Frontiers in Corpus Annotations II: Pie in the Sky
, 2005
"... We investigated of the characteristics of in-text causal relations. We designed causal relation tags. With our designed tag set, three annotators annotated 750 Japanese newspaper articles. Then, using the annotated corpus, we investigated the causal relation instances from some viewpoints. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We investigated of the characteristics of in-text causal relations. We designed causal relation tags. With our designed tag set, three annotators annotated 750 Japanese newspaper articles. Then, using the annotated corpus, we investigated the causal relation instances from some viewpoints.
Exploiting semantic role resources for preposition disambiguation
- Computational Linguistics
, 2009
"... This article describes how semantic role resources can be exploited for preposition disambiguation. The main resources include the semantic role annotations provided by the Penn Treebank and FrameNet tagged corpora. The resources also include the assertions contained in the Factotum knowledge base, ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This article describes how semantic role resources can be exploited for preposition disambiguation. The main resources include the semantic role annotations provided by the Penn Treebank and FrameNet tagged corpora. The resources also include the assertions contained in the Factotum knowledge base, as well as information from Cyc and Conceptual Graphs. A common inventory is derived from these in support of definition analysis, which is the motivation for this work. The disambiguation concentrates on relations indicated by prepositional phrases, and is framed as word-sense disambiguation for the preposition in question. A new type of feature for word-sense disambiguation is introduced, using WordNet hypernyms as collocations rather than just words. Various experiments over the Penn Treebank and FrameNet data are presented, including prepositions classified separately versus together, and illustrating the effects of filtering. Similar experimentation is done over the Factotum data, including a method for inferring likely preposition usage from corpora, as knowledge bases do not generally indicate how relationships are expressed in English (in contrast to the explicit annotations on this in the Penn Treebank and FrameNet). Other experiments are included with the FrameNet data mapped into the common relation inventory developed for definition analysis, illustrating how preposition disambiguation might be applied in lexical acquisition. 1.

