Results 1  10
of
23
Online LargeMargin Training of Syntactic and Structural Translation Features
"... Minimumerrorrate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative ..."
Abstract

Cited by 114 (12 self)
 Add to MetaCart
(Show Context)
Minimumerrorrate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT. We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrasebased model: first, we simultaneously train a large number of Marton and Resnik’s soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 Bleu on a subset of the NIST 2006 ArabicEnglish evaluation data.
11,001 new features for statistical machine translation
 In North American Chapter of the Association for Computational Linguistics  Human Language Technologies (NAACLHLT
, 2009
"... We use the Margin Infused Relaxed Algorithm of Crammer et al. to add a large number of new features to two machine translation systems: the Hiero hierarchical phrasebased translation system and our syntaxbased translation system. On a largescale ChineseEnglish translation task, we obtain statisti ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
(Show Context)
We use the Margin Infused Relaxed Algorithm of Crammer et al. to add a large number of new features to two machine translation systems: the Hiero hierarchical phrasebased translation system and our syntaxbased translation system. On a largescale ChineseEnglish translation task, we obtain statistically significant improvements of +1.5 Bleu and +1.1 Bleu, respectively. We analyze the impact of the new features and the performance of the learning algorithm. 1
Discriminative loglinear grammars with latent variables
 In Proceedings of NIPS 20
, 2008
"... We demonstrate that loglinear grammars with latent variables can be practically trained using discriminative methods. Central to efficient discriminative training is a hierarchical pruning procedure which allows feature expectations to be efficiently approximated in a gradientbased procedure. We c ..."
Abstract

Cited by 44 (6 self)
 Add to MetaCart
(Show Context)
We demonstrate that loglinear grammars with latent variables can be practically trained using discriminative methods. Central to efficient discriminative training is a hierarchical pruning procedure which allows feature expectations to be efficiently approximated in a gradientbased procedure. We compare L1 and L2 regularization and show that L1 regularization is superior, requiring fewer iterations to converge, and yielding sparser solutions. On fullscale treebank parsing experiments, the discriminative latent models outperform both the comparable generative latent models as well as the discriminative nonlatent baselines. 1
Learning and Inference in WEIGHTED LOGIC WITH APPLICATION TO NATURAL LANGUAGE PROCESSING
, 2008
"... ..."
Perceptron Reranking for CCG Realization
"... This paper shows that discriminative reranking with an averaged perceptron model yields substantial improvements in realization quality with CCG. The paper confirms the utility of including language model log probabilities as features in the model, which prior work on discriminative training with lo ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
This paper shows that discriminative reranking with an averaged perceptron model yields substantial improvements in realization quality with CCG. The paper confirms the utility of including language model log probabilities as features in the model, which prior work on discriminative training with log linear models for HPSG realization had called into question. The perceptron model allows the combination of multiple ngram models to be optimized and then augmented with both syntactic features and discriminative ngram features. The full model yields a stateoftheart BLEU score of 0.8506 on Section 23 of the CCGbank, to our knowledge the best score reported to date using a reversible, corpusengineered grammar. 1
Automatic Improvement of Machine Translation Systems
, 2007
"... N66001992891804. Any opinions, findings, conclusions, or recommendations expressed in this material are those of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
N66001992891804. Any opinions, findings, conclusions, or recommendations expressed in this material are those of
Maximum Rank Correlation Training for Statistical Machine Translation
"... We propose Maximum Ranking Correlation (MRC) as an objective function in discriminative tuning of parameters in a linear model of Statistical Machine Translation (SMT). We try to maximize the ranking correlation between sentence level BLEU (SBLEU) scores and model scores of the Nbest list, while th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We propose Maximum Ranking Correlation (MRC) as an objective function in discriminative tuning of parameters in a linear model of Statistical Machine Translation (SMT). We try to maximize the ranking correlation between sentence level BLEU (SBLEU) scores and model scores of the Nbest list, while the MERT paradigm focuses on the potential 1best candidates of the Nbest list. After we optimize the MER and the MRC objectives using an multiple objective optimization algorithm at the same time, we interpolate them to obtain parameters which outperform both. Experimental results on WMT French–English data set confirm that our method significantly outperforms MERT on outofdomain data sets, and performs marginally better than MERT on indomain data sets, which validates the usefulness of MRC on both domain specific and general domain data.
Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs
"... Pruning can massively accelerate the computation of feature expectations in large models. However, any single pruning mask will introduce bias. We present a novel approach which employs a randomized sequence of pruning masks. Formally, we apply auxiliary variable MCMC sampling to generate this seque ..."
Abstract
 Add to MetaCart
(Show Context)
Pruning can massively accelerate the computation of feature expectations in large models. However, any single pruning mask will introduce bias. We present a novel approach which employs a randomized sequence of pruning masks. Formally, we apply auxiliary variable MCMC sampling to generate this sequence of masks, thereby gaining theoretical guarantees about convergence. Because each mask is generally able to skip large portions of an underlying dynamic program, our approach is particularly compelling for highdegree algorithms. Empirically, we demonstrate our method on bilingual parsing, showing decreasing bias as more masks are incorporated, and outperforming fixed tictactoe pruning. 1
A Structured Prediction Approach for Statistical Machine Translation
"... We propose a new formally syntaxbased method for statistical machine translation. Transductions between parsing trees are transformed into a problem of sequence tagging, which is then tackled by a searchbased structured prediction method. This allows us to automatically acquire translation knowledg ..."
Abstract
 Add to MetaCart
We propose a new formally syntaxbased method for statistical machine translation. Transductions between parsing trees are transformed into a problem of sequence tagging, which is then tackled by a searchbased structured prediction method. This allows us to automatically acquire translation knowledge from a parallel corpus without the need of complex linguistic parsing. This method can achieve comparable results with phrasebased method (like Pharaoh), however, only about ten percent number of translation table is used. Experiments show that the structured prediction approach for SMT is promising for its strong ability at combining words. 1