Results 31 - 40
of
205
A cascaded linear model for joint chinese word segmentation and part-of-speech tagging
- In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics
, 2008
"... We propose a cascaded linear model for joint Chinese word segmentation and partof-speech tagging. With a character-based perceptron as the core, combined with realvalued features such as language models, the cascaded model is able to efficiently utilize knowledge sources that are inconvenient to inc ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
We propose a cascaded linear model for joint Chinese word segmentation and partof-speech tagging. With a character-based perceptron as the core, combined with realvalued features such as language models, the cascaded model is able to efficiently utilize knowledge sources that are inconvenient to incorporate into the perceptron directly. Experiments show that the cascaded model achieves improved accuracies on both segmentation only and joint segmentation and part-of-speech tagging. On the Penn Chinese Treebank 5.0, we obtain an error reduction of 18.5 % on segmentation and 12 % on joint segmentation and part-of-speech tagging over the perceptron-only baseline. 1
An ngram-based statistical machine translation decoder
- PROC. OF THE 9TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY, INTERSPEECH’05
, 2005
"... In this paper we describe MARIE, an Ngram-based statistical machine translation decoder. It is implemented using a beam search strategy, with distortion (or reordering) capabilities. The underlying translation model is based on an Ngram approach, extended to introduce reordering at the phrase level. ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
In this paper we describe MARIE, an Ngram-based statistical machine translation decoder. It is implemented using a beam search strategy, with distortion (or reordering) capabilities. The underlying translation model is based on an Ngram approach, extended to introduce reordering at the phrase level. The search graph structure is designed to perform very accurate comparisons, what allows for a high level of pruning, improving the decoder efficiency. We report several techniques for efficiently prune out the search space. The combinatory explosion of the search space derived from the search graph structure is reduced by limiting the number of reorderings a given translation is allowed to perform, and also the maximum distance a word (or a phrase) is allowed to be reordered. We finally report translation accuracy results on three different translation tasks.
Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices
"... Minimum Error Rate Training (MERT) and Minimum Bayes-Risk (MBR) decoding are used in most current state-of-theart Statistical Machine Translation (SMT) systems. The algorithms were originally developed to work with N-best lists of translations, and recently extended to lattices that encode many more ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Minimum Error Rate Training (MERT) and Minimum Bayes-Risk (MBR) decoding are used in most current state-of-theart Statistical Machine Translation (SMT) systems. The algorithms were originally developed to work with N-best lists of translations, and recently extended to lattices that encode many more hypotheses than typical N-best lists. We here extend lattice-based MERT and MBR algorithms to work with hypergraphs that encode a vast number of translations produced by MT systems based on Synchronous Context Free Grammars. These algorithms are more efficient than the lattice-based versions presented earlier. We show how MERT can be employed to optimize parameters for MBR decoding. Our experiments show speedups from MERT and MBR as well as performance improvements from MBR decoding on several language pairs. 1
PARAEVAL: Using paraphrases to evaluate summaries automatically
- IN: PROCEEDINGS OF HLT-NAACL
, 2006
"... ParaEval is an automated evaluation method for comparing reference and peer summaries. It facilitates a tieredcomparison strategy where recall-oriented global optimal and local greedy searches for paraphrase matching are enabled in the top tiers. We utilize a domainindependent paraphrase table extra ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
ParaEval is an automated evaluation method for comparing reference and peer summaries. It facilitates a tieredcomparison strategy where recall-oriented global optimal and local greedy searches for paraphrase matching are enabled in the top tiers. We utilize a domainindependent paraphrase table extracted from a large bilingual parallel corpus using methods from Machine Translation (MT). We show that the quality of ParaEval’s evaluations, measured by correlating with human judgments, closely resembles that of ROUGE’s.
Inc. Java Remote Method Invocation Specification
- Proceedings of ACL2006
, 1996
"... We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processinginspired approach, we detect which segments of the source sentence are translated into segments in the ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processinginspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. 1
Bayesian learning of non-compositional phrases with synchronous parsing
- In ACL
, 2008
"... We combine the strengths of Bayesian modeling and synchronous grammar in unsupervised learning of basic translation phrase pairs. The structured space of a synchronous grammar is a natural fit for phrase pair probability estimation, though the search space can be prohibitively large. Therefore we ex ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We combine the strengths of Bayesian modeling and synchronous grammar in unsupervised learning of basic translation phrase pairs. The structured space of a synchronous grammar is a natural fit for phrase pair probability estimation, though the search space can be prohibitively large. Therefore we explore efficient algorithms for pruning this space that lead to empirically effective results. Incorporating a sparse prior using Variational Bayes, biases the models toward generalizable, parsimonious parameter sets, leading to significant improvements in word alignment. This preference for sparse solutions together with effective pruning methods forms a phrase alignment regimen that produces better end-to-end translations than standard word alignment approaches. 1
Reevaluating machine translation results with paraphrase support
- In Proceedings of EMNLP
, 2006
"... In this paper, we present ParaEval, an automatic evaluation framework that uses paraphrases to improve the quality of machine translation evaluations. Previous work has focused on fixed n-gram evaluation metrics coupled with lexical identity matching. ParaEval addresses three important issues: suppo ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In this paper, we present ParaEval, an automatic evaluation framework that uses paraphrases to improve the quality of machine translation evaluations. Previous work has focused on fixed n-gram evaluation metrics coupled with lexical identity matching. ParaEval addresses three important issues: support for paraphrase/synonym matching, recall measurement, and correlation with human judgments. We show that ParaEval correlates significantly better than BLEU with human assessment in measurements for both fluency and adequacy. 1
Interactively exploring a machine translation model
- Poster in Proc. ACL
, 2005
"... This paper describes a method of interactively visualizing and directing the process of translating a sentence. The method allows a user to explore a model of syntax-based statistical machine translation (MT), to understand the model’s strengths and weaknesses, and to compare it to other MT systems. ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper describes a method of interactively visualizing and directing the process of translating a sentence. The method allows a user to explore a model of syntax-based statistical machine translation (MT), to understand the model’s strengths and weaknesses, and to compare it to other MT systems. Using this visualization method, we can find and address conceptual and practical problems in an MT system. In our demonstration at ACL, new users of our tool will drive a syntaxbased decoder for themselves. 1
Learning phrase-based spelling error models from clickthrough data
- In ACL
, 2010
"... This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users ' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probabi ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users ' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms significantly its baseline systems. 1
Machine translation in the year 2004
- In Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP
, 2005
"... Increased availability of parallel data and recent progress in modeling, decoding, and evaluation have recently had a major impact on machine translation (MT) accuracy. This paper covers the basic elements of state-of-the-art, statistical MT. 1. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Increased availability of parallel data and recent progress in modeling, decoding, and evaluation have recently had a major impact on machine translation (MT) accuracy. This paper covers the basic elements of state-of-the-art, statistical MT. 1.

