Results 1 - 10
of
33
Better k-best parsing
, 2005
"... We discuss the relevance of k-best parsing to recent applications in natural language processing, and develop efficient algorithms for k-best trees in the framework of hypergraph parsing. To demonstrate the efficiency, scalability and accuracy of these algorithms, we present experiments on Bikel’s i ..."
Abstract
-
Cited by 103 (14 self)
- Add to MetaCart
We discuss the relevance of k-best parsing to recent applications in natural language processing, and develop efficient algorithms for k-best trees in the framework of hypergraph parsing. To demonstrate the efficiency, scalability and accuracy of these algorithms, we present experiments on Bikel’s implementation of Collins ’ lexicalized PCFG model, and on Chiang’s CFG-based decoder for hierarchical phrase-based translation. We show in particular how the improved output of our algorithms has the potential to improve results from parse reranking systems and other applications. 1
An end-to-end discriminative approach to machine translation
- In Proceedings of the Joint International Conference on Computational Linguistics and Association of Computational Linguistics (COLING/ACL
, 2006
"... We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of learned features in all stages of decoding. We first discuss several challenges to error-driven disc ..."
Abstract
-
Cited by 77 (2 self)
- Add to MetaCart
We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of learned features in all stages of decoding. We first discuss several challenges to error-driven discriminative approaches. In particular, we explore different ways of updating parameters given a training example. We find that making frequent but smaller updates is preferable to making fewer but larger updates. Then, we discuss an array of features and show both how they quantitatively increase BLEU score and how they qualitatively interact on specific examples. One particular feature we investigate is a novel way to introduce learning into the initial phrase extraction process, which has previously been entirely heuristic. 1
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Hidden-Variable Models for Discriminative Reranking
- In Proceedings of HLTEMNLP
, 2005
"... We describe a new method for the representation of NLP structures within reranking approaches. We make use of a conditional log–linear model, with hidden variables representing the assignment of lexical items to word clusters or word senses. The model learns to automatically make these assignments b ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
We describe a new method for the representation of NLP structures within reranking approaches. We make use of a conditional log–linear model, with hidden variables representing the assignment of lexical items to word clusters or word senses. The model learns to automatically make these assignments based on a discriminative training criterion. Training and decoding with the model requires summing over an exponential number of hidden– variable assignments: the required summations can be computed efficiently and exactly using dynamic programming. As a case study, we apply the model to parse reranking. The model gives an F – measure improvement of ≈ 1.25 % beyond the base parser, and an ≈ 0.25% improvement beyond the Collins (2000) reranker. Although our experiments are focused on parsing, the techniques described generalize naturally to NLP structures other than parse trees. 1
A discriminative global training algorithm for statistical MT
- In Proc. of ACL
, 2006
"... This paper presents a novel training algorithm for a linearly-scored block sequence translation model. The key component is a new procedure to directly optimize the global scoring function used by a SMT decoder. No translation, language, or distortion model probabilities are used as in earlier work ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
This paper presents a novel training algorithm for a linearly-scored block sequence translation model. The key component is a new procedure to directly optimize the global scoring function used by a SMT decoder. No translation, language, or distortion model probabilities are used as in earlier work on SMT. Therefore our method, which employs less domain specific knowledge, is both simpler and more extensible than previous approaches. Moreover, the training procedure treats the decoder as a black-box, and thus can be used to optimize any decoding scheme. The training algorithm is evaluated on a standard Arabic-English translation task. 1
Dependency tree translation: Syntactically informed phrasal smt
- In ACL
, 2005
"... done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a source-language dependency parser and a word-aligned parallel corpus. The only targe ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a source-language dependency parser and a word-aligned parallel corpus. The only target language resource assumed is a word breaker. These are used to produce treelet (“phrase”) translation pairs as well as several models, including a channel model, an order model, and a target language model. Together these models and the treelet translation pairs provide a powerful and promising approach to MT that incorporates the power of phrasal SMT with the linguistic generality available in a parser. We evaluate two decoding approaches, one inspired by dynamic programming and the
On Some Pitfalls in Automatic Evaluation and Significance Testing for MT
, 2005
"... We investigate some pitfalls regarding the discriminatory power of MT evaluation metrics and the accuracy of statistical significance tests. In a discriminative reranking experiment for phrase-based SMT we show that the NIST metric is more sensitive than BLEU or F-score despite their incorpora ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
We investigate some pitfalls regarding the discriminatory power of MT evaluation metrics and the accuracy of statistical significance tests. In a discriminative reranking experiment for phrase-based SMT we show that the NIST metric is more sensitive than BLEU or F-score despite their incorporation of aspects of fluency or meaning adequacy into MT evaluation. In an experimental comparison of two statistical significance tests we show that p-values are estimated more conservatively by approximate randomization than by bootstrap tests, thus increasing the likelihood of type-I error for the latter. We point out a pitfall of randomly assessing significance in multiple pairwise comparisons, and conclude with a recommendation to combine NIST with approximate randomization, at more stringent rejection levels than is currently standard.
Considerations in Maximum Mutual Information and Minimum Classification Error training for Statistical Machine Translation
- In the Proceedings of EAMT-05
, 2005
"... Discriminative training methods are used in statistical machine translation to effectively introduce and combine additional knowledge sources within the translation process. Although these methods are described in the accompanying literature and comparative studies are available for speech recogniti ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Discriminative training methods are used in statistical machine translation to effectively introduce and combine additional knowledge sources within the translation process. Although these methods are described in the accompanying literature and comparative studies are available for speech recognition, additional considerations are introduced when applying discriminative training to statistical machine translation. In this paper we pay special attention to the comparison and formalization of discriminative training criteria and their respective optimization methods with the goal of improving translation performance measured by the corpus level BLEU metric with a Viterbi beam based decoder. We frame this work within the current trends in discriminative training and present reproducible results that highlight the potential as well as shortcomings of N-Best list based discriminative training. 1
Chunk-Level Reordering of Source Language Sentences with Automatically Learned Rules for Statistical Machine Translation
"... In this paper, we describe a sourceside reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-side chunks and word alignments. During translatio ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper, we describe a sourceside reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-side chunks and word alignments. During translation, the rules are used to generate a reordering lattice for each sentence. Experimental results are reported for a Chinese-to-English task, showing an improvement of 0.5%–1.8% BLEU score absolute on various test sets and better computational efficiency than reordering during decoding. The experiments also show that the reordering at the chunk-level performs better than at the POS-level. 1
Word Reordering in Statistical Machine Translation with a POS-Based Distortion Model
"... In this paper we describe a word reordering strategy for statistical machine translation that reorders the source side based on Part of Speech (POS) information. Reordering rules are learned from the word aligned corpus. Reordering is integrated into the decoding process by constructing a lattice, w ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In this paper we describe a word reordering strategy for statistical machine translation that reorders the source side based on Part of Speech (POS) information. Reordering rules are learned from the word aligned corpus. Reordering is integrated into the decoding process by constructing a lattice, which contains all word reorderings according to the reordering rules. Probabilities are assigned to the different reorderings. On this lattice monotone decoding is performed. This reordering strategy is compared with our previous reordering strategy, which looks at all permutations within a sliding window. We extend reordering rules by adding context information. Phrase translation pairs are learned from the original corpus and from a reordered source corpus to better capture the reordered word sequences at decoding time. Results are presented for English → Spanish and

