Results 1 -
7 of
7
Maximum Entropy Models for Realization Ranking
- In Proceedings of the 10th Machine Translation Summit (pp. 109
, 2005
"... In this paper we describe and evaluate di#erent statistical models for the task of realization ranking, i.e. the problem of discriminating between competing surface realizations generated for a given input semantics. Three models are trained and tested; an n-gram language model, a discriminative max ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
In this paper we describe and evaluate di#erent statistical models for the task of realization ranking, i.e. the problem of discriminating between competing surface realizations generated for a given input semantics. Three models are trained and tested; an n-gram language model, a discriminative maximum entropy model using structural features, and a combination of these two. Our realization component forms part of a larger, hybrid MT system.
Paraphrasing Treebanks for Stochastic Realization Ranking
- In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories
, 2004
"... This paper describes a novel approach to the task of realization ranking, i.e. the choice among competing paraphrases for a given input semantics, as produced by a generation system. We also introduce a notion of symmetric treebanks, which we define as the combination of (a) a set of pairings of sur ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
This paper describes a novel approach to the task of realization ranking, i.e. the choice among competing paraphrases for a given input semantics, as produced by a generation system. We also introduce a notion of symmetric treebanks, which we define as the combination of (a) a set of pairings of surface forms and associated semantics plus (b) the sets of alternative analyses for the surface form and sets of alternate realizations of the semantics. For inclusion of alternate analyses and realizations in the symmetric treebank, we propose to make the underlying linguistic theory explicit and operational, viz. in the form of a broad-coverage computational grammar. Extending earlier work on grammar-based treebanks in the Redwoods (Oepen et al. [13]) paradigm, we present a fully automated procedure to produce a symmetric treebank from existing resources. To evaluate the utility of an initial (albeit smallish) such `expanded' treebank, we report on experimental results for training stochastic discriminative models for the realization ranking task. Our work is set...
DCG Induction using MDL and Parsed Corpora
- Learning Language in Logic, pages 63–71, Bled,Slovenia
, 1999
"... We show how partial models of natural language syntax (manually written DCGs, with parameters estimated from a parsed corpus) can be automatically extended when trained upon raw text (using MDL). We also show how we can use a parsed corpus as an alternative constraint upon learning. Empirical ev ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We show how partial models of natural language syntax (manually written DCGs, with parameters estimated from a parsed corpus) can be automatically extended when trained upon raw text (using MDL). We also show how we can use a parsed corpus as an alternative constraint upon learning. Empirical evaluation suggests that a parsed corpus is more informative than a MDL-based prior. However, best results are achieved when the learner is supervised with a compressionbased prior and a parsed corpus.
Memory-Based Re-Engineering of a Knowledge-Based Dependency Parser
"... Abstract. The emulation of a knowledge-based dependency parser for Dutch by a fast approximation of a memory-based learning algorithm is described. During the development of the original parser, hand-parsed test sentences were collected to offer stochastic guidance in the the parsing process. Traini ..."
Abstract
- Add to MetaCart
Abstract. The emulation of a knowledge-based dependency parser for Dutch by a fast approximation of a memory-based learning algorithm is described. During the development of the original parser, hand-parsed test sentences were collected to offer stochastic guidance in the the parsing process. Training a memory-based parser directly on these collections yields a reasonable but not very accurate emulation. However, when we train the memory-based parser on a much larger collection of texts that were automatically parsed by the knowledge-based parser, it is possible to prolong the learning curve. The resulting re-engineered parser performs at linear speed in function of the length of the input sequence; through brute force, the costly computations of the parser are precompiled into memory, from which retrieval is cheap. 1
Learning Computational Grammars
"... This report presents a general overview of the network related activities at this site and specific reports for the postdoc, the PhD student, the local coordinator and others. An overview of the training activities concludes this section ..."
Abstract
- Add to MetaCart
This report presents a general overview of the network related activities at this site and specific reports for the postdoc, the PhD student, the local coordinator and others. An overview of the training activities concludes this section
Semi-Automatic Extension of . . .
"... We present a tool that facilitates the efficient extension of morphological lexica. The tool exploits information from a morphological lexicon, a morphological grammar and a text corpus to guide the acquisition process. In particular, it employs statistical models to analyze out-of-vocabulary words ..."
Abstract
- Add to MetaCart
We present a tool that facilitates the efficient extension of morphological lexica. The tool exploits information from a morphological lexicon, a morphological grammar and a text corpus to guide the acquisition process. In particular, it employs statistical models to analyze out-of-vocabulary words and predict lexical information. These models do not require any additional labeled data for training. Furthermore, they are based on generic features that are not specific to any particular language. This paper describes the general design of the tool and evaluates the accuracy of its machine learning components.

