Results 1 - 10
of
47
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
Word Sense Disambiguation Using a Second Language Monolingual Corpus
- Computational Linguistics
, 1994
"... This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of targe ..."
Abstract
-
Cited by 129 (1 self)
- Add to MetaCart
This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of target word selection in machine translation, for which the approach is directly applicable. The presented algorithm identifies syntactic relationships between words, using a source language parser, and maps the alternative interpretations of these relationships to the target language, using a bilingual lexicon. The preferred senses are then selected according to statistics on lexical relations in the target language. The selection is based on a statistical model and on a constraint propagation algorithm, which handles simultaneously all ambiguities in the sentence. The method was evaluated using three sets of Hebrew and German examples and was found to be very useful for disambiguation. The paper includes a detailed comparative analysis of statistical sense disambiguation methods. 1. Introduction The resolution of lexical ambiguities in non-restricted text is one of the most difficult tasks of natural language processing. A related task in machine translation, on which we focus in this paper, is target word selection. This is the task of deciding which target language word is the most appropriate equivalent of a source language word in context. In addition to the alternatives introduced by the different word senses of the source language word, the target language may specify additional alternatives that differ mainly in their usage. Traditionally several linguistic levels were used to deal with this problem: syntactic, semantic and pragmatic. Computationally the syntactic methods...
Automatic Identification of Word Translations from Unrelated English and German Corpora
, 1999
"... Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is
A Polynomial-Time Algorithm for Statistical Machine Translation
- In 34th Annual Meeting of the Association for Computational Linguistics
, 1996
"... We introduce a polynomial-time algorithm for statistical machine translation. This algorithm can be used in place of the expensive, slow best-first search strategies in current statistical translation architectures. ..."
Abstract
-
Cited by 68 (6 self)
- Add to MetaCart
We introduce a polynomial-time algorithm for statistical machine translation. This algorithm can be used in place of the expensive, slow best-first search strategies in current statistical translation architectures.
Aligning A Parallel English-Chinese Corpus Statistically With Lexical Criteria
, 1994
"... We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) lengthbased statis ..."
Abstract
-
Cited by 63 (13 self)
- Add to MetaCart
We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) lengthbased statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.
Identifying Word Translations in Non-Parallel Texts
, 1995
"... Common algorithms for sentence and word-alignment allow the automatic identification of word translations from parallel texts. This study suggests that the identification of word translations should also be possible with non-parallel and even unrelated texts. The method proposed is based on the assu ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
Common algorithms for sentence and word-alignment allow the automatic identification of word translations from parallel texts. This study suggests that the identification of word translations should also be possible with non-parallel and even unrelated texts. The method proposed is based on the assumption that there is a correlation between the patterns of word cooccurrences in texts of different languages. 1 Introduction In a number of recent studies it has been shown that word translations can be automatically derived from the statistical distribution of words in bilingual parallel texts (e. g. Catizone, Russell & Warwick, 1989; Brown et al., 1990; Dagan, Church & Gale, 1993; Kay & Roscheisen, 1993). Most of the proposed algorithms first conduct an alignment of sentences, i. e. those pairs of sentences are located that are translations of each other. In a second step a word alignment is performed by analyzing the correspondences of words in each pair of sentences. The results achie...
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora
- IN PROCEEDINGS OF THE 33RD ANNUAL CONFERENCE OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1995
"... We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/IndcEuropean language pairs. Tagging information of one guage is used. Word frequency and position information for high and low frequency words are represent ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/IndcEuropean language pairs. Tagging information of one guage is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-Parallel Corpora
- Parallel Text Processing
, 1998
"... . We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
. We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora. Using DKvec on noisy parallel corpora in English/Japanese and English/Chinese, our evaluations show a 55.35% precision from a small corpus and 89.93% precision from a larger corpus. Our major contribution is in the extraction of bilingual lexicon from non-parallel corpora. We present a first such result in this area, from a new method--Convec. Convec is based on context information of a word to be translated. We show a 30% to 76% precision when top-one to top-20 translation candidates are considered. Most of the top-20 candidates are either collocations or words rela...
A DP based Search Using Monotone Alignments in Statistical Translation
- In Proc. 35th Annual Conf. of The Association for Computational Linguistics
, 1997
"... lu this paper, we describe a Dynamic Programming (DP) based search algorithm for statistical translation and present experimental results. Tile statistical trans- lation uses two sources of information: a translation model and a language model. ..."
Abstract
-
Cited by 39 (13 self)
- Add to MetaCart
lu this paper, we describe a Dynamic Programming (DP) based search algorithm for statistical translation and present experimental results. Tile statistical trans- lation uses two sources of information: a translation model and a language model.
Grammarless Extraction of Phrasal Translation Examples from Parallel Texts
- In Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation
, 1995
"... We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversi ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modeling, and (2) it requires no language-specific monolingual grammars for the source and target languages. Instead, we devise a generic, language-independent constituent-matching ITG with inherent expressiveness properties that correspond to a desirable level of matching flexibility. Bilingual parsing, in conjunction with a stochastic version of the ITG formalism, performs the phrasal translation extraction. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science TMI-95 WU 2 1 Introduction Phrasal translation examples at the subsentential level are an...

