Results 1 - 10
of
26
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
A Polynomial-Time Algorithm for Statistical Machine Translation
- In 34th Annual Meeting of the Association for Computational Linguistics
, 1996
"... We introduce a polynomial-time algorithm for statistical machine translation. This algorithm can be used in place of the expensive, slow best-first search strategies in current statistical translation architectures. ..."
Abstract
-
Cited by 68 (6 self)
- Add to MetaCart
We introduce a polynomial-time algorithm for statistical machine translation. This algorithm can be used in place of the expensive, slow best-first search strategies in current statistical translation architectures.
Automated Dictionary Extraction for "Knowledge-Free" Example-Based Translation
- In Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation
, 1997
"... An Example-Based Machine Translation system is supplied with a sentence-aligned bilingual corpus, but no other knowledge sources. Using the knowledge implicit in the corpus, it generates a bilingual word-for-word dictionary for alignment during translation. With such an automatically-generated dicti ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
An Example-Based Machine Translation system is supplied with a sentence-aligned bilingual corpus, but no other knowledge sources. Using the knowledge implicit in the corpus, it generates a bilingual word-for-word dictionary for alignment during translation. With such an automatically-generated dictionary, the system covers (with equivalent quality) more of its input on unseen texts than the same system does when provided with a manually-created general-purpose dictionary and other knowledge sources.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
A Class-based Approach to Word Alignment
- Computational Linguistics
, 1997
"... This paper presents an algorithm capable of identifying the translation for each word in a bilingual corpus. Previously proposed methods rely heavily on word-based statistics. Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of precision. Howeve ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
This paper presents an algorithm capable of identifying the translation for each word in a bilingual corpus. Previously proposed methods rely heavily on word-based statistics. Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of precision. However, words that are less frequent or exhibit diverse translations generally do not have statistically significant evidence for confident alignment, thereby leading to incomplete or incorrect alignments. The algorithm proposed herein attempts to broaden coverage by exploiting lexicographic resources. To this end, we draw on the two classification systems of words in Longman Lexicon of Contemporary English (LLOCE) and Tongyici Cilin (Synonym Forest, CILIN). Automatically acquired class-based alignment rules are used to compensate for what is lacking in a bilingual dictionary such as the English-Chinese version of the Longman Dictionary of Contemporary English (LecDOCE). In addition, this alignment method is implemented using LecDOCE examples and their translations for training and testing, while further examples from a technical manual in both English and Chinese are used for an open test. Quantitative results of the closed and open tests are also summarized
Learning Translation Rules From A Bilingual Corpus
- IN: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON NEW METHODS IN LANGUAGE PROCESSING (NEMLAP-2
, 1996
"... This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language mu ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the di erent parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also di erences are learned in the form of translation rules. The system is tested on a small training dataset and produced promising results for further investigation.
Learning Translations of Named-Entity Phrases from Parallel Corpora
- IN PROC. OF EACL
, 2003
"... We develop a new approach to learning phrase translations from parallel corpora, and show that it performs with very high coverage and accuracy in choosing French translations of English named-entity phrases in a test corpus of software manuals. Analysis of a subset of our results suggests th ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We develop a new approach to learning phrase translations from parallel corpora, and show that it performs with very high coverage and accuracy in choosing French translations of English named-entity phrases in a test corpus of software manuals. Analysis of a subset of our results suggests that the method should also perform well on more general phrase translation tasks.
Trainable Coarse Bilingual Grammars for Parallel Text Bracketing
- In Proceedings of the Third Annual Workshop on Very Large Corpora
"... We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first approach borrows a coarse monolingual grammar into our bilingual formalism, in order to transfer knowledge of one language's constraints to the task of bracketing the texts in both languages. The second approach generalizes the inside-outside algorithm to adjust the grammar parameters so as to improve the likelihood of a training corpus. Preliminary experiments on parallel English-Chinese text are supportive of these strategies.
Mixed Language Query Disambiguation
- IN ACL-99. THE 37TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1999
"... We propose a mixed language query disambiguation approach by using co-occurrence information from monolingual data only. A mixed language query consists of words in a primary language and a secondary language. Our method translates the query into monolingual queries in either language. Two novel fea ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We propose a mixed language query disambiguation approach by using co-occurrence information from monolingual data only. A mixed language query consists of words in a primary language and a secondary language. Our method translates the query into monolingual queries in either language. Two novel features for disambiguation, namely contextual word voting and 1-best contextual word, are introduced and compared to a baseline feature, the nearest neighbor. Average query translation accuracy for the two features are 81.37% and 83.72%, compared to the baseline accuracy of 75.50%.
Introduction to Corpus-based Statistics-oriented (CBSO) Techniques
- Pre-Conference Workshop on Corpus-based NLP, ROCLING VII, National Tsing-Hua Univ
, 1994
"... A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the nat ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in knowledge acquisition in terms of cost and consistency. Therefore, it is very difficult for such systems to be scaled-up. Statistical methods, with the capability of automatically acquiring knowledge from corpora, are becoming more and more popular, in part, to amend the shortcomings of rule-based approaches. However, most simple statistical models, which adopt almost nothing from existing linguistic knowledge, often result in a large parameter space and, thus, require an unaffordably large training corpus for even well-justified linguistic phenomena. The corpus-based statistics-oriented (CBSO) approach is a compromise between the two extremes of the spectrum for knowledge acquisition. CBSO approach

