• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Survey of Multilingual Text Retrieval (1996)

by D W Oard, B J Dorr
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 46
Next 10 →

A Word-to-Word Model of Translational Equivalence

by I. Dan Melamed , 1997
"... Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts f ..."
Abstract - Cited by 73 (6 self) - Add to MetaCart
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level . The model's precision /recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as part-of-speech, dictionaries, word order, etc.. Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionarysized translation lexicons, and it can do so with over 99% accuracy.

Information retrieval on the Web

by Mei Kobayashi, Koichi Takeda - ACM Computing Surveys , 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract - Cited by 58 (0 self) - Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited

Automatic Discovery of Non-Compositional Compounds in Parallel Data

by I. Dan Melamed , 1997
"... Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine translation (MT), where many word sequences are not translated word- ..."
Abstract - Cited by 58 (1 self) - Add to MetaCart
Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine translation (MT), where many word sequences are not translated word-for-word. This paper presents an efficient automatic method for discover- ing sequences of words that are translated as a unit. The method proceeds by comparing pairs of statistical translation models induced from parallel texts in two languages. It can discover hundreds of noncompositional compounds on each iteration, and constructs longer compounds out of shorter ones. Objective evaluation on a simple machine translation task has shown the method's potential to improve the quality of MT output. The method makes few assumptions about the data, so it can be applied to parallel data other than parallel texts, such as word spellings and pronunci- ations.

Alternative approaches for cross-language text retrieval

by Douglas W. Oard - In AAAI Symposium on cross-language text and speech retrieval. American Association for Artificial Intelligence , 1997
"... The explosive growth of the Internet and other sources of networked information have made automatic mediation of access to networked information sources an increasingly important problem. Much of this information ..."
Abstract - Cited by 42 (5 self) - Add to MetaCart
The explosive growth of the Internet and other sources of networked information have made automatic mediation of access to networked information sources an increasingly important problem. Much of this information

Disambiguation strategies for cross-language information retrieval

by Djoerd Hiemstra - In Proceedings of the third European Conference on Research and Advanced Technology for Digital Libraries (ECDL , 1999
"... Keywords: Cross-Language Information Retrieval, Statistical Machine ..."
Abstract - Cited by 33 (11 self) - Add to MetaCart
Keywords: Cross-Language Information Retrieval, Statistical Machine

Image Retrieval: Content versus Context

by Thijs Westerveld - In Content-Based Multimedia Information Access, RIAO 2000 Conference , 2000
"... In this paper, we introduce a new approach to image retrieval. This new approach takes the best from two worlds, combines image features (content) and words from collateral text (context) into one semantic space. Our approach uses Latent Semantic Indexing, a method that uses co-occurrence statistics ..."
Abstract - Cited by 29 (2 self) - Add to MetaCart
In this paper, we introduce a new approach to image retrieval. This new approach takes the best from two worlds, combines image features (content) and words from collateral text (context) into one semantic space. Our approach uses Latent Semantic Indexing, a method that uses co-occurrence statistics to uncover hidden semantics. This paper shows how this method, that has proven successful in both monolingual and cross lingual text retrieval, can be used for multi-modal and cross-modal information retrieval. Experiments with an on-line newspaper archive show that Latent Semantic Indexing can outperform both content based and context based approaches and that it is a promising approach for indexing visual and multi-modal data. 1 Introduction In the last few years, several research groups have been investigating content based image retrieval (Flickner et al., 1997; Gevers and Smeulders, 1997; Marsicoi, Clinque and Levialdi, 1997). A popular approach is querying by example and computing re...

Using Structured Queries for Disambiguation in Cross-Language Information Retrieval

by David A. Hull , 1997
"... Bilingual transfer dictionaries are an important resource for query translation in cross-language text retrieval. However, term translation is not an isomorphic process, so dictionary-based systems must address the problem of ambiguity in language translation. In this paper, we claim that boolean co ..."
Abstract - Cited by 27 (1 self) - Add to MetaCart
Bilingual transfer dictionaries are an important resource for query translation in cross-language text retrieval. However, term translation is not an isomorphic process, so dictionary-based systems must address the problem of ambiguity in language translation. In this paper, we claim that boolean conjunction (the AND operator) provides simple and automatic disambiguation in the target language. We derive a new weighted boolean model based on a probabilistic formulation and apply it to the crosslanguage text retrieval problem. The results suggest that the weighted boolean model is highly effective for general text retrieval, but more experimental evidence is need to conclude that it is particularly advantageous for cross-language application. Nonetheless, the preliminary results are quite promising. 1 Introduction With the ongoing development of multilingual information retrieval systems, researchers are becoming increasing interested in the problem of cross-language information retrie...

Cross-Language Information Retrieval with the UMLS Metathesaurus

by David Eichmann, Miguel E. Ruiz - In: Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , 1998
"... david-eichmann(Puiowa.edu mruizQcs.uiowa.edu Abstract We investigate an automatic method for Cross Language Information Retrieval (CLIR) that uti-lizes the multilingual UMLS Metathesaurus to translate Spanish and French natural language queries into En-glish. Two experiments are presented using OHSU ..."
Abstract - Cited by 27 (0 self) - Add to MetaCart
david-eichmann(Puiowa.edu mruizQcs.uiowa.edu Abstract We investigate an automatic method for Cross Language Information Retrieval (CLIR) that uti-lizes the multilingual UMLS Metathesaurus to translate Spanish and French natural language queries into En-glish. Two experiments are presented using OHSUMED, a subset of MEDLINE. Both experiments examine re-trieval effectiveness of the translated queries. However, in the second experiment, the query translation procedure is augmented with digram based vocabulary normaliza-tion procedures. In this comparative study of retrieval effectiveness the measures used are: 11-point-average precision score (11-AvgP); average interpolated preci-sion at recall of 0.1; and noninterpolated (i.e., exact) precision after 10 retrieved documents. Our results in-dicate that for Spanish the UMLS Metathesaurus based CLIR method appears equivalent to multilingual dictio-nary based approaches investigated in the current litera-ture French yields less favorable results and our analysis suggests that linguistic differences may have caused the performance differences. 1

Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings

by Ari Pirkola, Turid Hedlund, Heikki Keskustalo, Kalervo Järvelin - Information Retrieval , 2001
"... This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will p ..."
Abstract - Cited by 20 (3 self) - Add to MetaCart
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report findings for four different language pairs concerning the effectiveness of query structuring. The architecture of our automatic query translation and construction system is presented.

The SYSTRAN NLP browser: An application of machine translation technology in cross-language information retrieval

by Denis A. Gachot, Elke Lange, Jin Yang , 1998
"... The approach of using an existing machine translation system in multilingual information retrieval, as usually proposed, consists of automatically translating queries, or even the entire textual database, from one language to another. The information that machine translation technology can provide t ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
The approach of using an existing machine translation system in multilingual information retrieval, as usually proposed, consists of automatically translating queries, or even the entire textual database, from one language to another. The information that machine translation technology can provide to multilingual information retrieval has been more extensively explored at SYSTRAN. An existing information retrieval tool, which is based on SYSTRAN parsing and machine translation technology, has been investigated for use in information retrieval. This paper is a description of the implementation of what is now called the SYSTRAN NLP Browser, a cross-linguistic multilingual information retrieval system. The first section discusses the utilization of machine translation technology in multilingual information retrieval in general. The second section describes the implementation of the NLP Browser. The third section discusses the present approach and the current development status, followed by the conclusion. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University