Results 1 - 10
of
46
Latent Semantic Kernels
"... Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic In ..."
Abstract
-
Cited by 74 (7 self)
- Add to MetaCart
Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic Indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost. In this paper we describe how the LSI approach can be implementedinakernel-de ned feature space. We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.
Stylistic Experiments For Information Retrieval
, 2000
"... Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topi ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topic. The experiments described in this text investigate stylistic variation. Roughly put, style is the difference between two ways of saying the same thing -- and systematic stylistic variation can be used to characterize the genre of documents. These experiments investigate if stylistic information is distinguishable using simple language engineering methods, and if in that case this type of information can be used to improve information retrieval systems.
Alternative approaches for cross-language text retrieval
- In AAAI Symposium on cross-language text and speech retrieval. American Association for Artificial Intelligence
, 1997
"... The explosive growth of the Internet and other sources of networked information have made automatic mediation of access to networked information sources an increasingly important problem. Much of this information ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
The explosive growth of the Internet and other sources of networked information have made automatic mediation of access to networked information sources an increasingly important problem. Much of this information
Should we Translate the Documents or the Queries in Cross-language Information Retrieval?
, 1999
"... Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation di- rections using the same training data. W ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation di- rections using the same training data. We investigate information retrieval between English and French, incorporating both trans- lations directions into both document trans- lation and query translation-based information retrieval, as well as into hybrid systems. We find that hybrids o document and query translation-based systems outperform query translation systems, even human-quality query translation systems. I
Embedding web-based statistical translation models in cross-language information retrieval
- Computational Linguistics
, 2003
"... Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since cu ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost. 1.
Cross-Language Information Retrieval with the UMLS Metathesaurus
- In: Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1998
"... david-eichmann(Puiowa.edu mruizQcs.uiowa.edu Abstract We investigate an automatic method for Cross Language Information Retrieval (CLIR) that uti-lizes the multilingual UMLS Metathesaurus to translate Spanish and French natural language queries into En-glish. Two experiments are presented using OHSU ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
david-eichmann(Puiowa.edu mruizQcs.uiowa.edu Abstract We investigate an automatic method for Cross Language Information Retrieval (CLIR) that uti-lizes the multilingual UMLS Metathesaurus to translate Spanish and French natural language queries into En-glish. Two experiments are presented using OHSUMED, a subset of MEDLINE. Both experiments examine re-trieval effectiveness of the translated queries. However, in the second experiment, the query translation procedure is augmented with digram based vocabulary normaliza-tion procedures. In this comparative study of retrieval effectiveness the measures used are: 11-point-average precision score (11-AvgP); average interpolated preci-sion at recall of 0.1; and noninterpolated (i.e., exact) precision after 10 retrieved documents. Our results in-dicate that for Spanish the UMLS Metathesaurus based CLIR method appears equivalent to multilingual dictio-nary based approaches investigated in the current litera-ture French yields less favorable results and our analysis suggests that linguistic differences may have caused the performance differences. 1
Improving Cross-Language Text Retrieval with Human Interactions
, 2000
"... Can we expect people to be able to get information from texts in languages they cannot read? In this paper we review two relevant lines of research bearing on this question and will show how our results are being used in the design of a new Web interface for cross-language text retrieval. One line o ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Can we expect people to be able to get information from texts in languages they cannot read? In this paper we review two relevant lines of research bearing on this question and will show how our results are being used in the design of a new Web interface for cross-language text retrieval. One line of research, "Interactive IR", is concerned with the user interface issues for information retrieval systems such as how best to display the results of a text search. We review our current research, on "document thumbnail" visualizations, and discuss current Web conventions, practices and folklore. The other area of research, "Cross-Language Text Retrieval", is concerned with the design of automatic techniques, including Machine Translation, to retrieve texts in languages other than the language of the query. We review work we have done concerning query translation and multilingual text summarization. We then describe how these results are being applied and extended in the design a new demons...
The effect of bilingual term list size on dictionary-based cross-language information retrieval
, 2003
"... Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that are expressed in another. This paper identifies eight types of terms that affect retr ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that are expressed in another. This paper identifies eight types of terms that affect retrieval effectiveness in CLIR applications through their coverage by general-purpose bilingual term lists, and reports results from an experimental evaluation of the coverage of 35 bilingual term lists in news retrieval application. Retrieval effectiveness was found to be strongly influenced by term list size for lists that contain between 3,000 and 30,000 unique terms per language. Supplemental techniques for named entity translation were found to be useful with even the largest lexicons. The contribution of named entity translation was evaluated in a cross-language experiment involving English and Chinese. Smaller effects were observed from deficiencies in the coverage of domainspecific terminology when searching news stories.
Observing users, designing clarity: A case study on the user-centered design of a cross-language information retrieval system
- Journal of the American Society for Information Science and Technology
, 2004
"... This paper presents a case study of the development of an interface to a novel and complex form of document retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper presents a case study of the development of an interface to a novel and complex form of document retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. A study involving users (with such searching needs) from the start of the design process is described covering initial examination of user needs and tasks; preliminary design and testing of interface components; building, testing, and further refining an interface; before finally conducting usability tests of the system. Lessons are learned at every stage of the process leading to a much more informed view of how such an interface should be built. 1.
Keizai: An Interactive Cross-Language Text Retrieval System
- In Machine Translation Summit VII, Workshop on Machine Translation for Cross Language Information Retrieval
, 1999
"... Can we expect people to be able to get information from texts in languages they cannot read? In this paper we review two relevant lines of research bearing on this question and will show how our results are being used in the design of a new Web interface for cross-language text retrieval. One ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Can we expect people to be able to get information from texts in languages they cannot read? In this paper we review two relevant lines of research bearing on this question and will show how our results are being used in the design of a new Web interface for cross-language text retrieval. One line of research, "Interactive IR", is concerned with the user interface issues for information retrieval systems such as how best to display the results of a text search. We review our current research, on "document thumbnail" visualizations, and discuss current Web conventions, practices and folklore. The other area of research, "Cross-Language Text Retrieval", is concerned with the design of automatic techniques, including Machine Translation, to retrieve texts in languages other than the language of the query. We review work we have done concerning query translation and multilingual text summarization. We then describe how these results are being applied and extended in t...

