Results 1 - 10
of
13
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
Grammarless Extraction of Phrasal Translation Examples from Parallel Texts
- In Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation
, 1995
"... We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversi ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modeling, and (2) it requires no language-specific monolingual grammars for the source and target languages. Instead, we devise a generic, language-independent constituent-matching ITG with inherent expressiveness properties that correspond to a desirable level of matching flexibility. Bilingual parsing, in conjunction with a stochastic version of the ITG formalism, performs the phrasal translation extraction. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science TMI-95 WU 2 1 Introduction Phrasal translation examples at the subsentential level are an...
Trainable Coarse Bilingual Grammars for Parallel Text Bracketing
- In Proceedings of the Third Annual Workshop on Very Large Corpora
"... We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first approach borrows a coarse monolingual grammar into our bilingual formalism, in order to transfer knowledge of one language's constraints to the task of bracketing the texts in both languages. The second approach generalizes the inside-outside algorithm to adjust the grammar parameters so as to improve the likelihood of a training corpus. Preliminary experiments on parallel English-Chinese text are supportive of these strategies.
Example-based machine translation using DP-matching between word sequences
- PROC. OF ACL 2001 WORKSHOP ON DDMT
, 2001
"... We propose a new approach under the example-based machine translation paradigm. First, the proposed approach retrieves the most similar example by carrying out DP-matching of the input sentence and example sentences while measuring the semantic distance of the words. Second, the approach adju ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We propose a new approach under the example-based machine translation paradigm. First, the proposed approach retrieves the most similar example by carrying out DP-matching of the input sentence and example sentences while measuring the semantic distance of the words. Second, the approach adjusts the gap between the input and the most similar example by using a bilingual dictionary. We show the results of a computational experiment.
The Effects of Word Order and Segmentation on Translation Retrieval Performance
- In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000
, 2000
"... This research looks at the e#ects of word order and segmentation on translation retrieval performance for an experimental Japanese-English translation memory system. We implement a number of both bag-of-words and word order-sensitive similarity metrics, and test each over characterbased and word-bas ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This research looks at the e#ects of word order and segmentation on translation retrieval performance for an experimental Japanese-English translation memory system. We implement a number of both bag-of-words and word order-sensitive similarity metrics, and test each over characterbased and word-based indexing. The translation retrieval performance of each system configuration is evaluated empirically through the notion of word edit distance between translation candidate outputs and the model translation. Our results indicate that character-based indexing is consistently superior to word-based indexing, suggesting that segmentation is an unnecessary luxury in the given domain. Word order-sensitive approaches are demonstrated to generally outperform bag-of-words methods, with source language segment-level edit distance proving the most e#ective similarity metric.
Reducing Parsing Complexity by Intra-Sentence Segmentation based on Maximum Entropy Model
- Genetic Learning, Machine Translation
, 2000
"... problem because of high complexity. This paper addresses the reduction of parsing complexity by intra-sentence segmentation, and presents maximum entropy model for determining segmentation positions. The model features lexical contexts of segmentation positions, giving a probability to each potenti ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
problem because of high complexity. This paper addresses the reduction of parsing complexity by intra-sentence segmentation, and presents maximum entropy model for determining segmentation positions. The model features lexical contexts of segmentation positions, giving a probability to each potential position. Segmentation coverage and accuracy of the proposed method are 96% and 88% respectively. The parsing efficiency is improved by 77% in time and 71% in space. I
Bracketing and aligning words and constituents in parallel text using stochastic inversion transduction grammars
- in Parallel Text Processing: Alignment and Use of Translation Corpora
, 2000
"... parsing Abstract: We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major featur ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
parsing Abstract: We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finitestate transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing. 1.
Rhetorical and semantic environment for text alignment
- In Proceedings of Corpus Linguistics 2001, Editors
, 2001
"... In the framework of machine translation of multilingual parallel texts, the technique of alignment is based on statistical models and shallow linguistic parsing methods. When addressing the problem to corpora where different versions are derived or interpreted from the same source, we need further c ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In the framework of machine translation of multilingual parallel texts, the technique of alignment is based on statistical models and shallow linguistic parsing methods. When addressing the problem to corpora where different versions are derived or interpreted from the same source, we need further criteria to consider the forms of disparity between these versions. In this article we propose a content-driven approach based on the semantic and pragmatic structure of texts to aid in the process of alignment and comparison between their versions. 1.
Confidence Factor Assignment to Translation Templates
, 1998
"... that I have read this thesis and that in my opinion it is fully adequate, in scope ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
that I have read this thesis and that in my opinion it is fully adequate, in scope
Utilizing Contextually Relevant Terms in Bilingual Lexicon Extraction
"... This paper demonstrates one efficient technique in extracting bilingual word pairs from non-parallel but comparable corpora. Instead of using the common approach of taking high frequency words to build up the initial bilingual lexicon, we show contextually relevant terms that co-occur with cognate p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper demonstrates one efficient technique in extracting bilingual word pairs from non-parallel but comparable corpora. Instead of using the common approach of taking high frequency words to build up the initial bilingual lexicon, we show contextually relevant terms that co-occur with cognate pairs can be efficiently utilized to build a bilingual dictionary. The result shows that our models using this technique have significant improvement over baseline models especially when highestranked translation candidate per word is considered. 1

