Results 1 - 10
of
15
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web
- In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1999
"... This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In ..."
Abstract
-
Cited by 90 (8 self)
- Add to MetaCart
This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In addition, we also investigated the possibility of automatically gather parallel texts from the Web in an attempt to construct a reasonable training corpus. The result is very encouraging. We showed that in several tests, such a training corpus is as good as a manually constructed one for CLIR purposes.
Gaijin: A Bootstrapping, Template-Driven Approach to Example-Based MT
- In International Conference, Recent Advances in Natural Language Processing, Tzigov Chark
, 1997
"... Example-based Machine Translation (EBMT) is a recent approach to MT that offers robustness, scalability and graceful degradation, deriving as it does its competence not from explicit linguistic models of source and target languages, but from the wealth of bilingual corpora that are now avail ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Example-based Machine Translation (EBMT) is a recent approach to MT that offers robustness, scalability and graceful degradation, deriving as it does its competence not from explicit linguistic models of source and target languages, but from the wealth of bilingual corpora that are now available. Gaijin is such a system, employing statistical methods, string-matching, case-based reasoning and template-matching to provide a linguistics-lite EBMT solution. The only linguistics employed by Gaijin is a psycholinguistic constraintthe marker hypothesisthat is minimal, simple to apply, and arguably universal. The scope and current state of Gaijin is described, and some initial evaluation results are reported.
Bi-Textual Aids for Translators
- University of Waterloo
, 1992
"... While machine translation can successfully tackle some highly restricted sublanguages, it is in most cases more productive to turn to support tools for human translators. The functions taken over by existing translator's workstations are rather peripheral with respect to the core aspects of the tran ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
While machine translation can successfully tackle some highly restricted sublanguages, it is in most cases more productive to turn to support tools for human translators. The functions taken over by existing translator's workstations are rather peripheral with respect to the core aspects of the translation task. However, recent developments show that it is possible to automatically produce explicit (partial) representations of the translation correspondences that link pairs of source and target texts. These representations called bitexts provide the foundation required for the design of support tools that delve deeper into the realm of translation proper, such as: a) a translation memory that can be accessed by various means, including bilingual concordancing; b) translation critiquing tools capable of detecting correspondence errors such as omissions or deceptive cognates; and c) translator-oriented speech recognition systems capable of taking advantage of correspondence contraints wi...
Aligning parallel texts: Do methods developed for EnglishFrench generalize to Asian languages
- In Proceedings of Pacific Asia Conference on Formal and Computational Linguistics
, 1993
"... ..."
Merging Example-Based and Statistical Machine Translation
- Proceedings of the Fifth Conference of Association for Machine Translation in the Americas
, 2002
"... Abstract. Despite the exciting work accomplished over the past decade in the field of Statistical Machine Translation (SMT), we are still far from the point of being able to say that machine translation fully meets the needs of real-life users. In a previous study [6], we have shown how a SMT engine ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Despite the exciting work accomplished over the past decade in the field of Statistical Machine Translation (SMT), we are still far from the point of being able to say that machine translation fully meets the needs of real-life users. In a previous study [6], we have shown how a SMT engine could benefit from terminological resources, especially when translating texts very different from those used to train the system. In the present paper, we discuss the opening of SMT to examples automatically extracted from a Translation Memory (TM). We report results on a fair-sized translation task using the database of a commercial bilingual concordancer. 1
Machine-Aided Human Translation and the Paradigm Shift
- Proceedings of the Fourth Machine Translation
, 1993
"... ingual texts. This simple result turns out to be of fundamental importance from the point of view of MAHT. It constitutes in itself a suitable foundation for many kinds of new translation support tools. More on this below. 5. Why should there be such a difference between the two paradigms? The expla ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ingual texts. This simple result turns out to be of fundamental importance from the point of view of MAHT. It constitutes in itself a suitable foundation for many kinds of new translation support tools. More on this below. 5. Why should there be such a difference between the two paradigms? The explanation, I think, is as follows. Rule-based MT tends to focus exclusively on the translation production problem. In the rare cases where it is possible to define good and complete translation models, this approach yields effective MT systems. But in all other cases, those where MAHT is called for, it turns out to be very difficult to make any use of production-oriented models. For example, it is hard to see how the particular target text intended by some translator could be partially generated in advance. Corpus -based methods, on the other hand, start from translations that have already been produced by humans and seek to discover their structure, completely o
TransSearch: A Bilingual Concordance Tool
, 1993
"... TransSearch is a system for building and exploiting a translation memory, i.e. a textual database consisting of pairs of documents that are mutual translations. Each such pair of documents, when inserted into the database, is submitted to an alignment process, which makes explicit the relations that ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
TransSearch is a system for building and exploiting a translation memory, i.e. a textual database consisting of pairs of documents that are mutual translations. Each such pair of documents, when inserted into the database, is submitted to an alignment process, which makes explicit the relations that exist between the sentences of the two texts. A user may then search for ready-made solutions to specific translation problems, using a graphical query language. 1 Introduction Computer automation is gradually coming into its own in the growing translation industry. Not, as one might have hoped, in the sense of automating the translation process itself, but rather through custom environments and tools to assist human translators. The earliest steps in this direction involved peripheral tasks like word processing, spell checking, access to on-line lexicons and terminology banks, storage and transfer of electronic documents, etc. CITI's PTT is a classic example of this type of environment. R...
Detection of Translational Equivalence
, 2001
"... I propose a general algorithm for detecting translational equivalence between text samples in different languages. This algorithm is based on current approaches to duplicate detection, and it relies on information which can be automatically learned from parallel text. I also show experimental result ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
I propose a general algorithm for detecting translational equivalence between text samples in different languages. This algorithm is based on current approaches to duplicate detection, and it relies on information which can be automatically learned from parallel text. I also show experimental results which support the hypothesis that translational equivalence is empirically observable. In addition, these results suggest profitable directions for improving performance on this recognition task. 1 This work is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science with High Honors at the University of Maryland, College Park. Thesis advisor: Professor Philip S. Resnik, Departments of Linguistics and Computer Science and Institute for Advanced Computer Study. 1 The research presented here was supported in part by the National Science Foundation, Johns Hopkins University, and DARPA/ITO Cooperative Agreement N660010028910. Contents 1
Translation Pattern Extraction and Recombination for Example-Based Machine Translation
, 2001
"... No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or institute of learning. An approach to Example-Based Machine Translation is presented which operates by extracting and recombining t ..."
Abstract
- Add to MetaCart
No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or institute of learning. An approach to Example-Based Machine Translation is presented which operates by extracting and recombining translation patterns from a bilingual corpus aligned at the level of the sentence. The translation patterns are extracted using a recursive machinelearning algorithm based on the principle of similar distributions of strings: source and target language lexical items that co-occur in the same two sentence-pairs are likely to be translations of each other. The translation patterns extracted represent generalisations of sentences that are translations of each other in that certain sequences of words are replaced by variables. The translation patterns resemble, to a certain extent, transfer rules but with less constraints since there is no concept of syntactic structure in this approach: translation patterns are extracted based on the

