Results 1 -
4 of
4
Automatic Cross-Linguistic Information Retrieval using Latent Semantic Indexing
, 1997
"... this document as a bag of freely intermingled French and English words. A set of training documents like this is analyzed using LSI, and the result is a reduced dimension semantic space in which related terms are near each other. Because the documents contained both French and English terms, the LS ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
this document as a bag of freely intermingled French and English words. A set of training documents like this is analyzed using LSI, and the result is a reduced dimension semantic space in which related terms are near each other. Because the documents contained both French and English terms, the LSI space will contain terms from both languages; this is what makes it possible for the CL-LSI method to avoid query translation. Words that are consistently paired in translation (e.g., Libya and Libye) will be given identical representations in the LSI space, whereas words that are frequently associated with one another (e.g., not and pas) will be given similar representations. The next step in the CL-LSI method is to add (or "fold in") documents in just French or English. As described above, this is done by locating a new document at the weighted vector sum of its constituent terms. The result of this process is that each document in the database has a language-independent representation in terms of numerical vectors. Users can now pose queries in either French or English and get back the most similar documents regardless of language. 3.2 Experimental Tests
Automatic Cross-Language Information Retrieval using Latent Semantic Indexing
- Cross-Language Information Retrieval, chapter 5
, 1998
"... We descride a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual s ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
We descride a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual semantic space using Latent Semantic Indexing (LSI). We present strong preliminary test results for our cross-language LSI (CL-LSI) method for a French-English collection. We also provide some evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI).
Automatic 3-Language Cross-Language Information Retrieval with Latent Semantic Indexing
- In The Sixth Text Retrieval Conference Notebook Papers (TREC6), 103--110. National Institute of Standards and Technology Special Publication
, 1998
"... This paper describes cross-language informationretrieval experiments carried out for TREC-6. Our retrieval method, cross-language latent semantic indexing (CL-LSI), is completely automatic and we were able to use it to create a 3-way EnglishFrench -German IR system. This study extends our previous w ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
This paper describes cross-language informationretrieval experiments carried out for TREC-6. Our retrieval method, cross-language latent semantic indexing (CL-LSI), is completely automatic and we were able to use it to create a 3-way EnglishFrench -German IR system. This study extends our previous work in terms of the large size of training and testing corpora, the use of low-quality training data, the evaluation using relevance judgments, and the number of languages analyzed. Introduction Cross-language LSI (CL-LSI) is a fully automatic method for cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual semantic space using latent semantic indexing (LSI); this semantic space is exploited in the form of a vector lexicon, which assigns each word in each language to a point in the high-dim...
Abstract
"... This paper describes cross-language informationretrieval experiments carried out for TREC-6. Our retrieval method, cross-language latent semantic indexing (CL-LSI), is completely automatic and we were able to use it to create a 3-way English-French-German IR system. This study extends our previous w ..."
Abstract
- Add to MetaCart
This paper describes cross-language informationretrieval experiments carried out for TREC-6. Our retrieval method, cross-language latent semantic indexing (CL-LSI), is completely automatic and we were able to use it to create a 3-way English-French-German IR system. This study extends our previous work in terms of the large size of training and testing corpora, the use of low-quality training data, the evaluation using relevance judgments, and the number of languages analyzed.

