Results 1 - 10
of
10
Minimum bayes-risk decoding for statistical machine translation
- In Proceedings of HLT-NAACL
, 2004
"... We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of l ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, word-to-word alignments from an MT system, and syntactic structure from parse-trees of source and target language sentences. We report the performance of the MBR decoders on a Chinese-to-English translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions. 1
Cross-Lingual Relevance Models
, 2002
"... We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the sour ..."
Abstract
-
Cited by 66 (5 self)
- Add to MetaCart
We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the source language. The model integrates popular techniques of disambiguation and query expansion in a unified formal framework. We describe how the topic model can be estimated with either a parallel corpus or a dictionary. We test the framework by constructing Chinese topic models from English queries and using them in the CLIR task of TREC9. The model achieves performance around 95% of the strong mono-lingual baseline in terms of average precision. In initial precision, our model outperforms the monolingual baseline by 20%. The main contribution of this work is the unified formal model which integrates techniques that are essential for e#ective Cross-Language Retrieval.
Do we need chinese word segmentation for statistical machine translation
- In Proceedings of the Third SIGHAN Workshop on Chinese Language Learning
, 2004
"... In Chinese texts, words are not separated by white spaces. This is problematic for many natural language processing tasks. The standard approach is to segment the Chinese character sequence into words. Here, we investigate Chinese word segmentation for statistical machine translation. We pursue two ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In Chinese texts, words are not separated by white spaces. This is problematic for many natural language processing tasks. The standard approach is to segment the Chinese character sequence into words. Here, we investigate Chinese word segmentation for statistical machine translation. We pursue two goals: the first one is the maximization of the final translation quality; the second is the minimization of the manual effort for building a translation system. The commonly used method for getting the word boundaries is based on a word segmentation tool and a predefined monolingual dictionary. To avoid the dependence of the translation system on an external dictionary, we have developed a system that learns a domainspecific dictionary from the parallel training corpus. This method produces results that are comparable with the predefined dictionary. Further more, our translation system is able to work without word segmentation with only a minor loss in translation quality. 1
Re-Engineering Letter-to-Sound Rules
, 2001
"... Using finite-state automata for the text analysis component in a text-to-speech system is problematic in several respects: the rewrite rules from which the automata are compiled are difficult to write and maintain, and the resulting automata can become very large and therefore inefficient. Convertin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Using finite-state automata for the text analysis component in a text-to-speech system is problematic in several respects: the rewrite rules from which the automata are compiled are difficult to write and maintain, and the resulting automata can become very large and therefore inefficient. Converting the knowledge represented explicitly in rewrite rules into a more efficient format is difficult. We take an indirect route, learning an efficient decision tree representation from data and tapping information contained in existing rewrite rules, which increases performance compared to learning exclusively from a pronunciation lexicon.
Towards a Model of Competence for Corpus-Based Machine Translation
- IAI Working Papers
, 1999
"... A translation is a conversion from a source language into a target language preserving the meaning. A huge number of techniques and computational approaches have been experimented in order to translate natural languages automatically, yet no satisfactory solution has been found. This paper examines ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A translation is a conversion from a source language into a target language preserving the meaning. A huge number of techniques and computational approaches have been experimented in order to translate natural languages automatically, yet no satisfactory solution has been found. This paper examines approaches to corpus-based machine translation (CBMT). In CBMT, a set of reference example translations is given to the MT system. These are analyzed and compiled into the system's internal representation according to the theory of meaning the system implements. The representations, then, serve as a basis to translate new sentences. This paper discusses three main approaches in the CBMT paradigm: the memory-based approach (e.g. translation memories (TM)), the example-based approach (EBMT) and the statistical-based approach (SBMT). Concrete CBMT systems are discussed in light of the theory of meaning (preservation) they implement. This discussion, then leads to a model of competence for CBMT systems. The paper concludes that CBMT systems can be designed to achieve high reliability or broad coverage, though both seem to be mutually exclusive qualities.
A Modular Architecture for Separating Hypothesis Formation from Hypothesis Evaluation in Data-driven Machine Translation
"... Recent research in statistical- and examplebased machine translation integrates ruleinduced structured representations with statistics and lexicalised exceptions. While rulebased approaches concentrate on the formation of partial translation hypotheses, probabilistic approaches are concerned with th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recent research in statistical- and examplebased machine translation integrates ruleinduced structured representations with statistics and lexicalised exceptions. While rulebased approaches concentrate on the formation of partial translation hypotheses, probabilistic approaches are concerned with the evaluation and selection of the best hypotheses. Within the METIS-II framework, we propose a machine translation system which uses transfer and expander rules to build an AND/OR graph of partial translations and a statistical ranker to find the best path through the graph. The paper gives an overview of the architecture and an evaluation of the system for several languages. 1
Structural Transfer In An English To Turkish Machine Translation System
, 1998
"... Old abstract: STRUCTURAL TRANSFER IN AN ENGLISH TO TURKISH MACHINE TRANSLATION SYSTEM Turhan, C¸ i~gdem Keyder Ph.D., Department of Computer Engineering Supervisor: Assoc. Prof. Mehmet Tolun January 1998 , 149 pages In the present thesis, the design and implementation of the transfer component of a ..."
Abstract
- Add to MetaCart
Old abstract: STRUCTURAL TRANSFER IN AN ENGLISH TO TURKISH MACHINE TRANSLATION SYSTEM Turhan, C¸ i~gdem Keyder Ph.D., Department of Computer Engineering Supervisor: Assoc. Prof. Mehmet Tolun January 1998 , 149 pages In the present thesis, the design and implementation of the transfer component of a transfer-based, human-assisted English to Turkish machine translation system using structural mapping has been discussed. The main objective of the system is to produce high-quality, low-cost, timely translations in technical domains. The source language intermediate representation utilized stores the feature structures and grammatical functions of the input sentence in a language independent formalism allowing the system to extend to other source languages. The important issues dealing with the differences between the languages are reflected as complex transfer rules in the transfer module. The system has been iii evaluated according to the system's performance and linguistic coverage. Keywords: Machin...
Review of "Statistical language learning" by Eugene Charniak
, 1993
"... Introduction The $64,000 question in computational linguistics these days is: "What should I read to learn about statistical natural language processing?" I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses this topic d ..."
Abstract
- Add to MetaCart
Introduction The $64,000 question in computational linguistics these days is: "What should I read to learn about statistical natural language processing?" I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses this topic directly, and the best one can do is find a good probability-theory textbook and a good information-theory textbook, and supplement those texts with an assortment of conference papers and journal articles. Understanding the disappointment this answer provoked, I was delighted to hear that someone had finally written a book directly addressing this topic. However, after reading Eugene Charniak's Statistical Language Learning, I have very mixed feelings about the impact this book might have on the ever-growing field of statistical NLP. The book begins with a very brief description of the classic artificial intelligence approach to NLP (chapter 1), including morphology, s
Modeling Morphologically Rich Languages Using Split Words and Unstructured Dependencies
"... We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume th ..."
Abstract
- Add to MetaCart
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n−1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n − 1 positions. Our final model achieves 27 % perplexity reduction compared to the standard n-gram model. 1
EACL 2006 Workshop on Multilingual Question Answering- MLQA06 Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese
"... In this paper, we describe the extension of an existing monolingual QA system for English-to-Chinese and English-to-Japanese cross-lingual question answering (CLQA). We also attempt to characterize the influence of translation on CLQA performance through experimental evaluation and analysis. The pap ..."
Abstract
- Add to MetaCart
In this paper, we describe the extension of an existing monolingual QA system for English-to-Chinese and English-to-Japanese cross-lingual question answering (CLQA). We also attempt to characterize the influence of translation on CLQA performance through experimental evaluation and analysis. The paper also describes some language-specific issues for keyword translation in CLQA. 1

