Results 1 - 10
of
10
A New Approach for English-Chinese Named Entity Alignment
- Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP
, 2004
"... Abstract ∗ Traditional word alignment approaches cannot come up with satisfactory results for Named Entities. In this paper, we propose a novel approach using a maximum entropy model for named entity alignment. To ease the training of the maximum entropy model, bootstrapping is used to help supervis ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract ∗ Traditional word alignment approaches cannot come up with satisfactory results for Named Entities. In this paper, we propose a novel approach using a maximum entropy model for named entity alignment. To ease the training of the maximum entropy model, bootstrapping is used to help supervised learning. Unlike previous work reported in the literature, our work conducts bilingual Named Entity alignment without word segmentation for Chinese and its performance is much better than that with word segmentation. When compared with IBM and HMM alignment models, experimental results show that our approach outperforms IBM Model 4 and HMM significantly. 1
Merging Example-Based and Statistical Machine Translation
- Proceedings of the Fifth Conference of Association for Machine Translation in the Americas
, 2002
"... Abstract. Despite the exciting work accomplished over the past decade in the field of Statistical Machine Translation (SMT), we are still far from the point of being able to say that machine translation fully meets the needs of real-life users. In a previous study [6], we have shown how a SMT engine ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Despite the exciting work accomplished over the past decade in the field of Statistical Machine Translation (SMT), we are still far from the point of being able to say that machine translation fully meets the needs of real-life users. In a previous study [6], we have shown how a SMT engine could benefit from terminological resources, especially when translating texts very different from those used to train the system. In the present paper, we discuss the opening of SMT to examples automatically extracted from a Translation Memory (TM). We report results on a fair-sized translation task using the database of a commercial bilingual concordancer. 1
A block bigram prediction model for statistical machine translation
- ACM Transactions Speech Language Processing
, 2007
"... In this paper, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrasebased translation that explicitly handles local phrase re-ordering. We use a maximum likelihood criterion ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrasebased translation that explicitly handles local phrase re-ordering. We use a maximum likelihood criterion to train a log-linear block bigram model which uses real-valued features (e.g. a language model score) as well as binary features based on the block identities themselves (e.g. block bigram features). The model training relies on an efficient enumeration of local block neighbors in parallel training data. A novel stochastic gradient descent (SGD) training algorithm is presented that can easily handle millions of features. Moreover, when viewing SMT as a block generation process, it becomes quite similar to sequential natural language annotation problems such as part-of-speech tagging, phrase chunking, or shallow parsing. The novel approach is successfully tested on a standard Arabic-English translation task using two different phrase re-ordering models: a block orientation model and a phrase-distortion model. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—statistical machine translation; G.3 [Probability and Statistics]: Statistical computing— stochastic gradient descent
Example-based decoding for statistical machine translation
- in Proc. of MT Summit IX
, 2003
"... This paper presents a decoder for statistical machine translation that can take advantage of the example-based machine translation framework. The decoder presented here is based on the greedy approach to the decoding problem, but the search is initiated from a similar translation extracted from a bi ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper presents a decoder for statistical machine translation that can take advantage of the example-based machine translation framework. The decoder presented here is based on the greedy approach to the decoding problem, but the search is initiated from a similar translation extracted from a bilingual corpus. The experiments on multilingual translations showed that the proposed method was far superior to a word-by-word generation beam search algorithm. 1
Language model data filtering via user simulation and dialogue resynthesis
- in Proc. of INTERSPEECH
, 2005
"... In this paper, we address the issue of generating language model training data during the initial stages of dialogue system development. The process begins with a large set of sentence templates, automatically adapted from other application domains. We propose two methods to filter the raw data set ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we address the issue of generating language model training data during the initial stages of dialogue system development. The process begins with a large set of sentence templates, automatically adapted from other application domains. We propose two methods to filter the raw data set to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions by modeling after a small set of dialogues produced by the developers during the course of system refinement. We evaluated our methodology by speech recognition performance on a set of 520 unseen utterances from naive users interacting with a restaurant domain dialogue system. 1.
Dynamic Translation Memory: Using Statistical Machine Translation to improve Translation Memory Fuzzy Matches
"... Abstract. Professional translators of technical documents often use Translation Memory (TM) systems in order to capitalize on the repetitions frequently observed in these documents. TM systems typically exploit not only complete matches between the source sentence to be translated and some previousl ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Professional translators of technical documents often use Translation Memory (TM) systems in order to capitalize on the repetitions frequently observed in these documents. TM systems typically exploit not only complete matches between the source sentence to be translated and some previously translated sentence, but also so-called fuzzy matches, where the source sentence has some substantial commonality with a previously translated sentence. These fuzzy matches can be very worthwhile as a starting point for the human translator, but the translator then needs to manually edit the associated TM-based translation to accommodate the differences with the source sentence to be translated. If part of this process could be automated, the cost of human translation could be significantly reduced. The paper proposes to perform this automation in the following way: a phrase-based Statistical Machine Translation (SMT) system (trained on a bilingual corpus in the same domain as the TM) is combined with the TM fuzzy match, by extracting from the fuzzy-match a large (possibly gapped) bi-phrase that is dynamically added to the usual set of “static ” bi-phrases used for decoding the source. We report experiments that show significant improvements in terms of BLEU and NIST scores over both the translations produced by the stand-alone SMT system and the fuzzy-match translations proposed by the stand-alone TM system. 1
Web-Based Machine Translation
, 2003
"... Abstract This chapter has two main aims: (i) to present the state-of-the-art in Machine Translation (MT), namely Phrase-Based Statistical MT, together with the major competing paradigms used in MT research and development today; and (ii) to provide an overview of the MT research carried out by my te ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract This chapter has two main aims: (i) to present the state-of-the-art in Machine Translation (MT), namely Phrase-Based Statistical MT, together with the major competing paradigms used in MT research and development today; and (ii) to provide an overview of the MT research carried out by my team here at DCU, characterised here in terms of ‘hybrid MT’. In addition, we provide our views on the directions that MT research might take in the near future, and conclude the chapter with lists of further reading for the interested reader.
Using Multilingual Content on the Web to Build Fast Finite-State Direct Translation Systems
, 2002
"... In this paper I try to identify and describe in certain detail a possible avenue of research in machine translation: the use of existing multilingual content on the web and finite-state technology to automatically build and maintain fast web-based direct machine translation systems, especially fo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper I try to identify and describe in certain detail a possible avenue of research in machine translation: the use of existing multilingual content on the web and finite-state technology to automatically build and maintain fast web-based direct machine translation systems, especially for language pairs lacking machine translation resources. The term direct is used to refer to systems performing no linguistic analysis, working similarly to pretranslators based on translation memories. Considering the
Integrated Phrase Segmentationand Alignmentalgorithm For Statistical Machine Translation
- IN PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'03
, 2003
"... We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases ..."
Abstract
- Add to MetaCart
We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the joint probabilities of the identified phrase pairs. The joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.

