Results 11 - 20
of
70
Bilingually-constrained (monolingual) shift-reduce parsing
- In EMNLP
, 2009
"... Jointly parsing two languages has been shown to improve accuracies on either or both sides. However, its search space is much bigger than the monolingual case, forcing existing approaches to employ complicated modeling and crude approximations. Here we propose a much simpler alternative, bilingually ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Jointly parsing two languages has been shown to improve accuracies on either or both sides. However, its search space is much bigger than the monolingual case, forcing existing approaches to employ complicated modeling and crude approximations. Here we propose a much simpler alternative, bilingually-constrained monolingual parsing, where a source-language parser learns to exploit reorderings as additional observation, but not bothering to build the target-side tree as well. We show specifically how to enhance a shift-reduce dependency parser with alignment features to resolve shift-reduce conflicts. Experiments on the bilingual portion of Chinese Treebank show that, with just 3 bilingual features, we can improve parsing accuracies by 0.6 % (absolute) for both English and Chinese over a state-of-the-art baseline, with negligible (∼6%) efficiency overhead, thus much faster than biparsing. 1
Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation
"... Word alignments that violate syntactic correspondences interfere with the extraction of string-to-tree transducer rules for syntaxbased machine translation. We present an algorithm for identifying and deleting incorrect word alignment links, using features of the extracted rules. We obtain gains in ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Word alignments that violate syntactic correspondences interfere with the extraction of string-to-tree transducer rules for syntaxbased machine translation. We present an algorithm for identifying and deleting incorrect word alignment links, using features of the extracted rules. We obtain gains in both alignment quality and translation quality in Chinese-English and Arabic-English translation experiments relative to a GIZA++ union baseline.
Direct Loss Minimization for Structured Prediction
"... In discriminative machine learning one is interested in training a system to optimize a certain desired measure of performance such as the BLEU score in machine translation or the intersection-over-union score in the PAS-CAL segmentation evaluation. We propose here a perceptron-like learning method ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In discriminative machine learning one is interested in training a system to optimize a certain desired measure of performance such as the BLEU score in machine translation or the intersection-over-union score in the PAS-CAL segmentation evaluation. We propose here a perceptron-like learning method based on computing a difference of feature vectors between two inferred output values where at least one of the outputs is inferred by lossadjusted inference. The main contribution of this paper is a theorem directly relating updates of this form to the gradient of the given loss function with respect to the system parameters. This provides a theoretical foundation for certain training methods which have already gained widespread use in machine translation. Empirical results on phonetic alignment are also given here surpassing all previously reported results on this problem. 1.
Online Learning Methods For Discriminative Training of Phrase Based Statistical Machine Translation
"... This paper investigates the task of training discriminatively a phrase based SMT system with millions of features using the structured perceptron and the Margin Infused Relax Algorithm (MIRA), two popular online learning algorithms. We also compare two different update strategies, one where we updat ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper investigates the task of training discriminatively a phrase based SMT system with millions of features using the structured perceptron and the Margin Infused Relax Algorithm (MIRA), two popular online learning algorithms. We also compare two different update strategies, one where we update towards an oracle translation candidate extracted from an N-best list vs a more aggressive approach in which we update towards an oracle extracted prior to training using a minloss decoder. We evaluate our different training algorithms on the Czech-English translation task. Our results show that while both learning algorithms achieve similar results, with the perceptron converging more rapidly, the aggressive update strategy performs significantly worse than the more conservative strategy corroborating Liang et al. (2006)’s findings. 1.
Translation as weighted deduction
- In Proc. of EACL
, 2009
"... We present a unified view of many translation algorithms that synthesizes work on deductive parsing, semiring parsing, and efficient approximate search algorithms. This gives rise to clean analyses and compact descriptions that can serve as the basis for modular implementations. We illustrate this w ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We present a unified view of many translation algorithms that synthesizes work on deductive parsing, semiring parsing, and efficient approximate search algorithms. This gives rise to clean analyses and compact descriptions that can serve as the basis for modular implementations. We illustrate this with several examples, showing how to build search spaces for several disparate phrase-based search strategies, integrate non-local features, and devise novel models. Although the framework is drawn from parsing and applied to translation, it is applicable to many dynamic programming problems arising in natural language processing and other areas. 1
Tuning as ranking
- In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
, 2011
"... We offer a simple, effective, and scalable method for statistical machine translation parameter tuning based on the pairwise approach to ranking (Herbrich et al., 1999). Unlike the popular MERT algorithm (Och, 2003), our pairwise ranking optimization (PRO) method is not limited to a handful of param ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We offer a simple, effective, and scalable method for statistical machine translation parameter tuning based on the pairwise approach to ranking (Herbrich et al., 1999). Unlike the popular MERT algorithm (Och, 2003), our pairwise ranking optimization (PRO) method is not limited to a handful of parameters and can easily handle systems with thousands of features. Moreover, unlike recent approaches built upon the MIRA algorithm of Crammer and Singer (2003) (Watanabe et al., 2007; Chiang et al., 2008b), PRO is easy to implement. It uses off-the-shelf linear binary classifier software and can be built on top of an existing MERT framework in a matter of hours. We establish PRO’s scalability and effectiveness by comparing it to MERT and MIRA and demonstrate parity on both phrase-based and syntax-based systems in a variety of language pairs, using large scale data scenarios. 1
Effective use of linguistic and contextual information for statistical machine translation
- In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
, 2008
"... Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parameters involved. In this paper, we propose methods of using new linguistic and contextual features that do not suffer from this problem and ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parameters involved. In this paper, we propose methods of using new linguistic and contextual features that do not suffer from this problem and apply them in a state-ofthe-art hierarchical MT system. The features used in this work are non-terminal labels, non-terminal length distribution, source string context and source dependency LM scores. The effectiveness of our techniques is demonstrated by significant improvements over a strong baseline. On Arabic-to-English translation, improvements in lower-cased BLEU are
Online large-margin training for statistical machine translation
- In Proc. of EMNLP
, 2007
"... We achieved a state of the art performance in statistical machine translation by using a large number of features with an online large-margin training algorithm. The millions of parameters were tuned only on a small development set consisting of less than 1K sentences. Experiments on Arabic-to-Engli ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We achieved a state of the art performance in statistical machine translation by using a large number of features with an online large-margin training algorithm. The millions of parameters were tuned only on a small development set consisting of less than 1K sentences. Experiments on Arabic-to-English translation indicated that a model trained with sparse binary features outperformed a conventional SMT system with a small number of features. 1
Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
"... Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO-of(Jobs, Apple). This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level. 1
A Sequence Alignment Model Based on the Averaged Perceptron
"... We describe a discriminatively trained sequence alignment model based on the averaged perceptron. In common with other approaches to sequence modeling using perceptrons, and in contrast with comparable generative models, this model permits and transparently exploits arbitrary features of input strin ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
We describe a discriminatively trained sequence alignment model based on the averaged perceptron. In common with other approaches to sequence modeling using perceptrons, and in contrast with comparable generative models, this model permits and transparently exploits arbitrary features of input strings. The simplicity of perceptron training lends more versatility than comparable approaches, allowing the model to be applied to a variety of problem types for which a learned edit model might be useful. We enumerate some of these problem types, describe a training procedure for each, and evaluate the model’s performance on several problems. We show that the proposed model performs at least as well as an approach based on statistical machine translation on two problems of name transliteration, and provide evidence that the combination of the two approaches promises further improvement. 1

