Results 1 - 10
of
15
Towards Broad Coverage Surface Realization with CCG
- In Proceedings of the Workshop on Using Corpora for NLG: Language Generation and Machine Translation (UCNLG+MT
, 2007
"... This paper reports on progress towards developing the first broad coverage English surface realizer for Combinatory Categorial Grammar (CCG). The paper provides initial automatic evaluation results which are roughly comparable to those reported with other formalisms when using a (nonblind) grammar d ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper reports on progress towards developing the first broad coverage English surface realizer for Combinatory Categorial Grammar (CCG). The paper provides initial automatic evaluation results which are roughly comparable to those reported with other formalisms when using a (nonblind) grammar derived from the development section of the CCGbank; the results are worse, though still respectable, when using the standard dev/train/test splits, highlighting the need for better lexical smoothing and more focused search. The paper also shows that factored language models that interpolate word-level n-grams with n-grams over POS tags and supertags provide similar absolute performance improvements over word-level n-grams as have been observed with parsing-inspired log-linear models. 1
Hypertagging: Supertagging for Surface Realization with CCG
"... In lexicalized grammatical formalisms, it is possible to separate lexical category assignment from the combinatory processes that make use of such categories, such as parsing and realization. We adapt techniques from supertagging — a relatively recent technique that performs complex lexical tagging ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In lexicalized grammatical formalisms, it is possible to separate lexical category assignment from the combinatory processes that make use of such categories, such as parsing and realization. We adapt techniques from supertagging — a relatively recent technique that performs complex lexical tagging before full parsing (Bangalore and Joshi, 1999; Clark, 2002) — for chart realization in OpenCCG, an open-source NLP toolkit for CCG. We call this approach hypertagging, as it operates at a level “above ” the syntax, tagging semantic representations with syntactic lexical categories. Our results demonstrate that a hypertagger-informed chart realizer can achieve substantial improvements in realization speed (being approximately twice as fast) with superior realization quality.
Probabilistic models for disambiguation of an HPSG-based chart generator
- In Proceedings of the 9th International Workshop on Parsing Technologies (pp. 93 – 102
, 2005
"... We describe probabilistic models for a chart generator based on HPSG. Within the research field of parsing with lexicalized grammars such as HPSG, recent developments have achieved efficient estimation of probabilistic models and high-speed parsing guided by probabilistic models. The focus of ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We describe probabilistic models for a chart generator based on HPSG. Within the research field of parsing with lexicalized grammars such as HPSG, recent developments have achieved efficient estimation of probabilistic models and high-speed parsing guided by probabilistic models. The focus of this paper is to show that two essential techniques -- model estimation on packed parse forests and beam search during parsing -- are successfully exported to the task of natural language generation. Additionally, we report empirical evaluation of the performance of several disambiguation models and how the performance changes according to the feature set used in the models and the size of training data.
Exploiting Multi-Word Units in History-Based Probabilistic Generation
"... We present a simple history-based model for sentence generation from LFG f-structures, which improves on the accuracy of previous models by breaking down PCFG independence assumptions so that more f-structure conditioning context is used in the prediction of grammar rule expansions. In addition, we ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a simple history-based model for sentence generation from LFG f-structures, which improves on the accuracy of previous models by breaking down PCFG independence assumptions so that more f-structure conditioning context is used in the prediction of grammar rule expansions. In addition, we present work on experiments with named entities and other multi-word units, showing a statistically significant improvement of generation accuracy. Tested on section 23 of the Penn Wall Street Journal Treebank, the techniques described in this paper improve BLEU scores from 66.52 to 68.82, and coverage from 98.18 % to 99.96%. 1
Learning for Semantic Parsing and Natural Language Generation Using Statistical Machine Translation Techniques
, 2007
"... ..."
DESIGNING FEATURES FOR PARSE DISAMBIGUATION AND REALISATION RANKING
, 2007
"... We present log-linear models for use in the tasks of parse disambiguation and realisation ranking in German. Forst (2007a) shows that by extending the set of features used in parse disambiguation to include more linguistically motivated information, disambiguation results can be significantly improv ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present log-linear models for use in the tasks of parse disambiguation and realisation ranking in German. Forst (2007a) shows that by extending the set of features used in parse disambiguation to include more linguistically motivated information, disambiguation results can be significantly improved for German data. The question we address in this paper is to what extent this improved set of features can also be used in realisation ranking. We carry out a number of experiments on German newspaper text. In parse disambiguation, we achieve an error reduction of 51%, compared to an error reduction of 34.5 % with the original model that does not include the additional features of Forst (2007a). In realisation ranking, BLEU score increases from 0.7306 to 0.7939, and we achieve a 10 point improvement in exact match over a baseline language model. This being said, our results also show that further features need to be taken into account for realisation ranking in order to improve the quality of the corresponding model. 1
Designing agreement features for realization ranking
- In Proc. Coling 2010: Posters
, 2010
"... This paper shows that incorporating linguistically motivated features to ensure correct animacy and number agreement in an averaged perceptron ranking model for CCG realization helps improve a state-ofthe-art baseline even further. Traditionally, these features have been modelled using hard constrai ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper shows that incorporating linguistically motivated features to ensure correct animacy and number agreement in an averaged perceptron ranking model for CCG realization helps improve a state-ofthe-art baseline even further. Traditionally, these features have been modelled using hard constraints in the grammar. However, given the graded nature of grammaticality judgements in the case of animacy we argue a case for the use of a statistical model to rank competing preferences. Though subject-verb agreement is generally viewed to be syntactic in nature, a perusal of relevant examples discussed in the theoretical linguistics literature (Kathol, 1999; Pollard and Sag, 1994) points toward the heterogeneous nature of English agreement. Compared to writing grammar rules, our method is more robust and allows incorporating information from diverse sources in realization. We also show that the perceptron model can reduce balanced punctuation errors that would otherwise require a post-filter. The full model yields significant improvements in BLEU scores on Section 23 of the CCGbank and makes many fewer agreement errors. 1
Parser-Based Retraining for Domain Adaptation of Probabilistic Generators
"... While the effect of domain variation on Penntreebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance dro ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
While the effect of domain variation on Penntreebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data. 1
Perceptron Reranking for CCG Realization
"... This paper shows that discriminative reranking with an averaged perceptron model yields substantial improvements in realization quality with CCG. The paper confirms the utility of including language model log probabilities as features in the model, which prior work on discriminative training with lo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper shows that discriminative reranking with an averaged perceptron model yields substantial improvements in realization quality with CCG. The paper confirms the utility of including language model log probabilities as features in the model, which prior work on discriminative training with log linear models for HPSG realization had called into question. The perceptron model allows the combination of multiple n-gram models to be optimized and then augmented with both syntactic features and discriminative n-gram features. The full model yields a stateof-the-art BLEU score of 0.8506 on Section 23 of the CCGbank, to our knowledge the best score reported to date using a reversible, corpus-engineered grammar. 1

