Results 1 -
7 of
7
A Wikipedia-Based Multilingual Retrieval Model
"... Abstract. This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Abstract. This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document d ∗ i chosen from the “L-subset ” of Wikipedia. Likewise, for a second document d ′ written in language L ′ , L � = L ′, we construct a concept vector d ′ , using from the L ′-subset of the Wikipedia the topic-aligned counterparts d ′∗ i of our previously chosen documents. Since the two concept vectors d and d ′ are collection-relative representations of d and d ′ they are language-independent. I. e., their similarity can directly be computed with the cosine similarity measure, for instance. We present results of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked. Salient property of the new retrieval model is its robustness with respect to both the size and the quality of the index document collection. 1
Corpus-Based Terminology Extraction Applied to Information Access
- In Proceedings of Corpus Linguistics 2001
, 2001
"... This paper presents an application of corpus-based terminology extraction in interactive information retrieval. In this approach, the terminology obtained in an automatic extraction procedure is used, without any manual revision, to provide retrieval indexes and a "browsing by phrases" facility for ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
This paper presents an application of corpus-based terminology extraction in interactive information retrieval. In this approach, the terminology obtained in an automatic extraction procedure is used, without any manual revision, to provide retrieval indexes and a "browsing by phrases" facility for document accessing in an interactive retrieval search interface. We argue that the combination of automatic terminology extraction and interactive search provides an optimal balance between controlled-vocabulary document retrieval (where thesauri are costly to acquire and maintain) and free text retrieval (where complex terms associated to domain specific concepts are largely overseen).
Building a Chinese-English WordNet for Translingual Applications
- ACM Transactions on Asian Languages Information Processing
, 2002
"... A WordNet-like linguistic resource is useful, but difficult to construct. This article proposes a method to integrate five linguistic resources, including English/Chinese sense-tagged corpora, English/Chinese thesauruses, and a bilingual dictionary. Chinese words are mapped into WordNet. A Chinese W ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A WordNet-like linguistic resource is useful, but difficult to construct. This article proposes a method to integrate five linguistic resources, including English/Chinese sense-tagged corpora, English/Chinese thesauruses, and a bilingual dictionary. Chinese words are mapped into WordNet. A Chinese WordNet and a Chinese-English WordNet are derived by following the structures of WordNet. Experiments with Chinese-English information retrieval are developed to evaluate the applicability of the Chinese-English WordNet. The best model achieves 0.1010 average precision, 69.23 % of monolingual information retrieval. It also gains a 10.02 % increase relative to a model that resolves translation ambiguity and target polysemy problems together.
Website term browser: Overcoming language barriers in text retrieval
- Journal of Intelligent and Fuzzy Systems
, 2002
"... Abstract: Current search systems fail to satisfy users when the relevant information is written in a foreign language; when the user is not aware of the relevant-perhaps specialized- terminology for a given topic; or when the user need is fuzzy and requires assisted search once inside an appropriate ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: Current search systems fail to satisfy users when the relevant information is written in a foreign language; when the user is not aware of the relevant-perhaps specialized- terminology for a given topic; or when the user need is fuzzy and requires assisted search once inside an appropriate web portal. This paper describes an interactive multilingual search system that alleviates such limitations, through the browsing of phrases in different languages after being automatically extracted from the text collection. The evaluation of WTB has been focussed in two aspects: the capability to offer translingual terminology to users, and the usefulness of phrase browsing. In this sense, the evaluation shows that users consider the new level of terminological information useful, as it complements the traditional document ranking outcome. 1
Browsing by Phrases: Terminological Information in Interactive Multilingual Text Retrieval
, 2001
"... This paper present an interactive search engine (Website Term Browser) which makes use of phrasal information to process queries and suggest relevant topics in a fully multilingual setting. Categories and Subject Descriptors Retrieval Issues: Cross-lingual retrieval, Text Retrieval, Browsing. Soci ..."
Abstract
- Add to MetaCart
This paper present an interactive search engine (Website Term Browser) which makes use of phrasal information to process queries and suggest relevant topics in a fully multilingual setting. Categories and Subject Descriptors Retrieval Issues: Cross-lingual retrieval, Text Retrieval, Browsing. Social Issues: Multilingual access. Keywords Multilingual Information Access, Interaction, Natural Language Processing, Terminology Extraction. 1. INTRODUCTION In an interactive setting, phrasal information has been used to suggest the user ways of enhancing and refining queries or browsing/classifying search results: . Handcraft hierarchies based on thesauri (e.g. ERIC) or topic hierarchies (e.g. Yahoo) to browse the document space. . Automatic building of terminological hierarchies. For instance, automatic clustering of documents into nested classes [3] or subsumption relations between terms [7]. . Extraction of links between documents with similar keywords [4]. . Query expansion with ...
Cross-Language Information Access through Phrase Browsing
- In Applications of Natural Language to Information Systems, Lecture Notes in Informatics
, 2001
"... : This paper presents a cross-language retrieval system which integrates ..."
Browsing by Phrases: Terminological Information in
"... This paper present an interactive search engine (Website Term Browser) which makes use of phrasal information to process queries and suggest relevant topics in a fully multilingual setting. ..."
Abstract
- Add to MetaCart
This paper present an interactive search engine (Website Term Browser) which makes use of phrasal information to process queries and suggest relevant topics in a fully multilingual setting.

