Results 1 - 10
of
61
Translingual information retrieval: A comparative evaluation
- In Proceedings of the 15th International Joint Conference on Artificial Intelligence
, 1997
"... Translingual information retrieval (TIR) consists of providing a query in one language and searching document collections in one or more di erent languages. This paper introduces new TIR methods and reports on comparative TIR experiments with these new methods and with previously reported ones in a ..."
Abstract
-
Cited by 59 (7 self)
- Add to MetaCart
Translingual information retrieval (TIR) consists of providing a query in one language and searching document collections in one or more di erent languages. This paper introduces new TIR methods and reports on comparative TIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation based, and statistical-IR approaches establishing translingual associations. The results show that using bilingual corpora for automated extraction of term equivalences in context outperforms other methods. Translingual versions of the Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) perform relatively well, as does translingual pseudo relevance feedback (PRF). All showed relatively small performance loss between monolingual and translingual versions. Query translation based on a general machinereadable bilingual dictionary { heretofore the most popular method { did not match the performance of other, more sophisticated methods. Also, the previous very high LSI results in the literature were discon rmed by more realistic relevance-based evaluations. 1
Adding Linguistic Knowledge to a Lexical Example-Based Translation System
- In Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99
, 1999
"... Example-Based Machine Translation (EBMT) using partial exact matching against a database of translation examples has proven quite successful, but requires a large amount of pre-translated text in order to achieve broad coverage of unrestricted text. By adding linguistically tagged entries to the exa ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
Example-Based Machine Translation (EBMT) using partial exact matching against a database of translation examples has proven quite successful, but requires a large amount of pre-translated text in order to achieve broad coverage of unrestricted text. By adding linguistically tagged entries to the example base and permitting recursive matches that replace the matched text with the associated tag, substantial reductions in the required amount of pre-translated text can be achieved. A modest investment of time -- on the order of two person-weeks -- adding linguistic knowledge reduces the required example text by a factor of six or more, while retaining comparable translation quality. This reduction makes EBMT more attractive for so-called "low-density" languages for which little data is available.
Automated Dictionary Extraction for "Knowledge-Free" Example-Based Translation
- In Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation
, 1997
"... An Example-Based Machine Translation system is supplied with a sentence-aligned bilingual corpus, but no other knowledge sources. Using the knowledge implicit in the corpus, it generates a bilingual word-for-word dictionary for alignment during translation. With such an automatically-generated dicti ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
An Example-Based Machine Translation system is supplied with a sentence-aligned bilingual corpus, but no other knowledge sources. Using the knowledge implicit in the corpus, it generates a bilingual word-for-word dictionary for alignment during translation. With such an automatically-generated dictionary, the system covers (with equivalent quality) more of its input on unseen texts than the same system does when provided with a manually-created general-purpose dictionary and other knowledge sources.
Translingual Information Retrieval: Learning from Bilingual Corpora
- Artificial Intelligence
, 1997
"... Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more different languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones i ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more different languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR approaches establishing translingual associations. The results show that using bilingual corpora for automated extraction of term equivalences in context outperforms dictionary-based methods. Translingual versions of the Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) also perform well, as does translingual pseudo relevance feedback (PRF) and Example-Based Term-in-context Translation (EBT). All showed relatively small performance loss between monolingual and translingual versions, ranging between 87% to 101% of monolingual IR performance. Query translation based on a general...
The Pangloss-Lite Machine Translation System
, 1996
"... performance are presented in section 2 below. To allow its use in the widest variety of applications, PanLite has been designed to translate strings provided either on the standard input or via network sockets, and to produce as output either the best 1 Pangloss was a joint project between three ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
performance are presented in section 2 below. To allow its use in the widest variety of applications, PanLite has been designed to translate strings provided either on the standard input or via network sockets, and to produce as output either the best 1 Pangloss was a joint project between three sites: the Computing Research Laboratory of New Mexico State University, the Information Sciences Institute of the University of Southern California, and the Center for Machine Translation of Carnegie Mellon University. It was funded by the U.S. Department of Defense. composite string or the full chart of scored translated segments. The latter is necessary, for example, when the output will be supplied to an external graphical user interface (GUI) for post-editing. PanLite has already been included as the MT component of the prototype DIPLOMAT rapiddeployment speech-to-speech translation system (see section 3, below). A potential future application of PanLite is as
Searching for Common Sense: Populating Cyc from the Web
- IN PROCEEDINGS OF THE TWENTIETH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2005
"... The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information -- what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been e ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information -- what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
An Adaptive Algorithm for Text Detection from Natural Scenes
- PROCEEDINGS OF COMPUTER VISION AND PATTERN RECOGNITION (CVPR
, 2001
"... We present a new adaptive algorithm for automatic detection of text from a natural scene. The initial cues of text regions are first detected from the captured image/video. An adaptive color modeling and searching algorithm is then utilized near the initial text cues, to discriminate text/non-text r ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
We present a new adaptive algorithm for automatic detection of text from a natural scene. The initial cues of text regions are first detected from the captured image/video. An adaptive color modeling and searching algorithm is then utilized near the initial text cues, to discriminate text/non-text regions. EM optimization algorithm is used for color modeling, under the constraint of text layout relations for a specific language. The proposed algorithm combines the advantages of several previous approaches for text detection, and utilizes a focus-of-attention approach for text finding. The whole algorithm is applied in a prototype system that can automatically detect and recognize sign input from a video camera, and translate the signs into English text or voice streams. We present evaluation results of our algorithm on this system.
Interactive Speech Translation in the DIPLOMAT Project
, 1997
"... The DIPLOMAT rapid-deployment speech translation system is intended to allow naive users to communicate across a language barrier, without strong domain restrictions, despite the errorprone nature of current speech and translation technologies. Achieving this ambitious goal depends in large p ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
The DIPLOMAT rapid-deployment speech translation system is intended to allow naive users to communicate across a language barrier, without strong domain restrictions, despite the errorprone nature of current speech and translation technologies. Achieving this ambitious goal depends in large part on allowing the users to interactively correct recognition and translation errors.
Reducing Boundary Friction Using Translation-Fragment Overlap
- in Proceedings of MT Summit IX
, 2003
"... Many corpus-based Machine Translation (MT) systems generate a number of partial translations which are then pieced together rather than immediately producing one overall translation. While this makes them more robust to ill-formed input, they are subject to disfluencies at phrasal translation bou ..."
Abstract
-
Cited by 13 (10 self)
- Add to MetaCart
Many corpus-based Machine Translation (MT) systems generate a number of partial translations which are then pieced together rather than immediately producing one overall translation. While this makes them more robust to ill-formed input, they are subject to disfluencies at phrasal translation boundaries even for well-formed input. We address this "boundary friction" problem by introducing a method that exploits overlapping phrasal translations and the increased confidence in translation accuracy they imply. We specify an efficient algorithm for producing translations using overlap. Finally, our empirical analysis indicates that this approach produces higher quality translations than the standard method of combining non-overlapping fragments generated by our Example-Based MT (EBMT) system in a peak-to-peak comparison.
Ordering Translation Templates by Assigning Confidence Factors
- IN: LECTURE
"... TTL (Translation Template Learner) algorithm learns lexical level correspondences between two translation examples by using analogical reasoning. The sentences used as translation examples have similar and different parts in the source language which must correspond to the similar and different part ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
TTL (Translation Template Learner) algorithm learns lexical level correspondences between two translation examples by using analogical reasoning. The sentences used as translation examples have similar and different parts in the source language which must correspond to the similar and different parts in the target language. Therefore these correspondences are learned as translation templates. The learned translation templates are used in the translation of other sentences. However, we need to assign confidence factors to these translation templates to order translation results with respect to previously assigned confidence factors. This paper proposes a method for assigning confidence factors to translation templates learned by the TTL algorithm. Training data is used for collecting statistical information that will be used in confidence factor assignment process. In this process, each template is assigned a confidence factor according to the statistical information obtained from training data. Furthermore, some template combinations are also assigned confidence factors in order to eliminate certain combinations resulting bad translation.

