Results 1 - 10
of
40
An Efficient Method for Determining Bilingual Word Classes
"... In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes s ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes suitable for statistical ma- chine translation. We develop an opti- mization criterion based on a maximum- likelihood approach and describe a clustering algorithm. We will show that the usage of the bilingual word classes we get can improve statistical machine transla- tion.
A DP based Search Using Monotone Alignments in Statistical Translation
- In Proc. 35th Annual Conf. of The Association for Computational Linguistics
, 1997
"... lu this paper, we describe a Dynamic Programming (DP) based search algorithm for statistical translation and present experimental results. Tile statistical trans- lation uses two sources of information: a translation model and a language model. ..."
Abstract
-
Cited by 39 (13 self)
- Add to MetaCart
lu this paper, we describe a Dynamic Programming (DP) based search algorithm for statistical translation and present experimental results. Tile statistical trans- lation uses two sources of information: a translation model and a language model.
Machine Translation with Inferred Stochastic Finite-State Transducers
- COMPUTATIONAL LINGUISTICS
, 2004
"... Finite-state transducers are models that are being used in different areas of pattern recognition and computational linguistics. One of these areas is machine translation, in which the approaches that are based on building models automatically from training examples are becoming more and more attrac ..."
Abstract
-
Cited by 35 (11 self)
- Add to MetaCart
Finite-state transducers are models that are being used in different areas of pattern recognition and computational linguistics. One of these areas is machine translation, in which the approaches that are based on building models automatically from training examples are becoming more and more attractive. Finite-state transducers are veryadequate for use in constrained tasks in which training samples of pairs of sentences are available. A technique for inferring finite-state transducers is proposed in this article. This technique is based on formalrelations between finite-state transducers and rational grammars. Given a training corpus of source-target pairs of sentences, the proposed approach uses statistical alignment methods to produce a set of conventional strings from which a stochastic rational grammar (e.g., an n-gram) is inferred. This grammar is finally converted into a finite-state transducer. The proposed methods are assessed through a series of machine translation experiments within the framework of the EuTrans project.
Greedy Decoding for Statistical Machine Translation in Almost Linear Time
, 2003
"... We present improvements to a greedy decoding algorithm for statistical machine translation that reduce its time complexity from at least cubic (O(n^6) when applied navely) to practically linear time without sacrificing translation quality. We achieve this by integrating hypothesis evaluati ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We present improvements to a greedy decoding algorithm for statistical machine translation that reduce its time complexity from at least cubic (O(n^6) when applied navely) to practically linear time without sacrificing translation quality. We achieve this by integrating hypothesis evaluation into hypothesis creation, tiling improvements over the translation hypothesis at the end of each search iteration, and by imposing restrictions on the amount of word reordering during decoding.
The EuTRANS-I Speech Translation System
, 1999
"... The EuTRANS project aims at using Example-Based approaches for the automatic development of Machine Translation systems --accepting text and speech input-- for limited domain applications. During the first phase of the project, a speech translation system that is based on the use of automatically le ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
The EuTRANS project aims at using Example-Based approaches for the automatic development of Machine Translation systems --accepting text and speech input-- for limited domain applications. During the first phase of the project, a speech translation system that is based on the use of automatically learnt Subsequential Transducers has been built. This paper contains a detailed and to a long extent self-contained overview of the transducer learning algorithms and system architecture, along with a new approach for using categories representing words or short phrases in both input and output languages. Experimental results using this approach are reported for a task involving the recognition and translation of sentences in the hotel reception communication domain, with a vocabulary of 683 words in Spanish. A translation word error rate of 1.97% is achieved in real time factor 2.7 in a Personal Computer.
FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation
- IN: ACL PROCEEDINGS. (2004
, 2004
"... In this paper we present the RWTH FSA toolkit -- an efficient implementation of algorithms for creating and manipulating weighted finite-state automata. The toolkit has been designed using the principle of on-demand computation and offers a large range of widely used algorithms. To prove the superio ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
In this paper we present the RWTH FSA toolkit -- an efficient implementation of algorithms for creating and manipulating weighted finite-state automata. The toolkit has been designed using the principle of on-demand computation and offers a large range of widely used algorithms. To prove the superior efficiency of the toolkit, we compare the implementation to that of other publically available toolkits. We also show that on-demand computations help to reduce memory requirements significantly without any loss in speed. To increase its flexibility, the RWTH FSA toolkit supports high-level interfaces to the programming language Python as well as a command-line tool for interactive manipulation of FSAs. Furthermore, we show how to utilize the toolkit to rapidly build a fast and accurate statistical machine translation system. Future extensibility of the toolkit is ensured as it will be publically available as open source software.
On the use of Bernoulli mixture models for text classification
- Pattern Recognition
, 2001
"... Mixture modelling of class-conditional densities is a standard pattern recognition technique. Although most research on mixture models has concentrated on mixtures for continuous data, emerging pattern recognition applications demand extending research eorts to other data types. This paper focus ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Mixture modelling of class-conditional densities is a standard pattern recognition technique. Although most research on mixture models has concentrated on mixtures for continuous data, emerging pattern recognition applications demand extending research eorts to other data types. This paper focuses on the application of mixtures of multivariate Bernoulli distributions to binary data. More concretely, a text classi cation task aimed at improving language modelling for machine translation is considered.
An Iterative, DP-based Search Algorithm for Statistical Machine Translation
- In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98
, 1998
"... The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for transla ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with Statistical Machine Translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently a Dynamic Programming-like algorithm has been introduced which computes approximate solutions for some models. These solutions can be improved by using an iterative algorithm that refines the succesive solutions and uses a smoothing technique for some probabilistic distribution of the models based on an interpolation of different distributions. The technique resulting from this combination has been tested on the “Tourist Task ” corpus, which was generated in a semi-automated way. The best results achieved were a word-error rate of 9.3% and a sentence-error rate of 44.4%. 1.
Inference of Finite-State Transducers By Using Regular Grammars and Morphisms
, 2000
"... A technique to infer finite-state transducers is proposed in this work. This technique is based on the formal relations between finite-state transducers and regular grammars. The technique consists of: 1) building a corpus of training strings from the corpus of training pairs; 2) inferring a regular ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
A technique to infer finite-state transducers is proposed in this work. This technique is based on the formal relations between finite-state transducers and regular grammars. The technique consists of: 1) building a corpus of training strings from the corpus of training pairs; 2) inferring a regular grammar and 3) transforming the grammar into a finite-state transducer.
2005. A Paraphrase-Based Approach to Machine Translation Evaluation
- University of Maryland, College Park
, 2005
"... We propose a novel approach to automatic machine translation evaluation based on paraphrase identification. The quality of machine-generated output can be viewed as the extent to which the conveyed meaning matches the semantics of reference translations, independent of lexical and syntactic divergen ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We propose a novel approach to automatic machine translation evaluation based on paraphrase identification. The quality of machine-generated output can be viewed as the extent to which the conveyed meaning matches the semantics of reference translations, independent of lexical and syntactic divergences. This idea is implemented in linear regression models that attempt to capture human judgments of adequacy and fluency, based on features that have previously been shown to be effective for paraphrase identification. We evaluated our model using the output of three different MT systems from the 2004 NIST Arabic-to-English MT evaluation. Results show that models employing paraphrase-based features correlate better with human judgments than models based purely on existing automatic MT metrics. 1 1

