Results 1 - 10
of
56
Using Linear Algebra for Intelligent Information Retrieval
- SIAM Review
, 1995
"... . Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical me ..."
Abstract
-
Cited by 450 (14 self)
- Add to MetaCart
. Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users...
Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval
- In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1997
"... Dictionary methods for cross-language information retrieval give performance below that for mono-lingual retrieval. Failure to translate multi-term phrases has been shown to be one of the factors responsible for the errors associated with dictionary methods. First, we study the importance of phrasal ..."
Abstract
-
Cited by 143 (3 self)
- Add to MetaCart
Dictionary methods for cross-language information retrieval give performance below that for mono-lingual retrieval. Failure to translate multi-term phrases has been shown to be one of the factors responsible for the errors associated with dictionary methods. First, we study the importance of phrasal translation for this approach. Second, we explore the role of phrases in query expansion via local context analysis and local feedback and show how they can be used to significantly reduce the error associated with automatic dictionary translation. 1 Introduction The development of IR systems for languages other than English has focused on building mono-lingual systems. Increased availability of on-line text in languages other than English and increased multi-national collaboration have motivated research in cross-language information retrieval (CLIR) - the development of systems to perform retrieval across languages. There have been three main approaches to CLIR: translation via machine t...
Improving the Effectiveness of Informational Retrieval with Local Context Analysis
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2000
"... Techniques for automatic query expansion have been extensively studied in information retrieval research as a means of addressing the word mismatch between queries and documents. These techniques can categorized as either global or local. While global techniques rely on analysis of a whole collec ..."
Abstract
-
Cited by 115 (4 self)
- Add to MetaCart
Techniques for automatic query expansion have been extensively studied in information retrieval research as a means of addressing the word mismatch between queries and documents. These techniques can categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top ranked documents retrieved for a query. Both types of techniques have advantages and limitations. In this paper we propose a new technique, called local context analysis, which combines the advantages of a global technique called Phrasefinder and a local technique known as local feedback. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.
Expertise Recommender: A Flexible Recommendation System and Architecture
- IN: PROCEEDINGS OF THE 2000 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK
, 2000
"... Locating the expertise necessary to solve difficult problems is a nuanced social and collaborative problem. In organizations, some people assist others in locating expertise by making referrals. People who make referrals fill key organizational roles that have been identified by CSCW and affiliated ..."
Abstract
-
Cited by 108 (5 self)
- Add to MetaCart
Locating the expertise necessary to solve difficult problems is a nuanced social and collaborative problem. In organizations, some people assist others in locating expertise by making referrals. People who make referrals fill key organizational roles that have been identified by CSCW and affiliated research. Expertise locating systems are not designed to replace people who fill these key organizational roles. Instead, expertise locating systems attempt to decrease workload and support people who have no other options. Recommendation systems are collaborative software that can be applied to expertise locating. This work describes a general recommendation architecture that is grounded in a field study of expertise locating. Our expertise recommendation system details the work necessary to fit expertise recommendation to a work setting. The architecture and implementation begin to tease apart the technical aspects of providing good recommendations from social and collaborative concerns.
Latent Semantic Indexing (LSI) and TREC-2
- The Second Text REtrieval Conference (TREC-2
, 1994
"... this paper. The "ltc" weights were computed on this matrix. 3.2 SVD analysis ..."
Abstract
-
Cited by 87 (2 self)
- Add to MetaCart
this paper. The "ltc" weights were computed on this matrix. 3.2 SVD analysis
Improving text retrieval for the routing problem using latent semantic indexing
- In Proc. of the 17th ACM-SIGIR Conference
, 1994
"... Latent Semantic Indexing (LSI) is a novel approach to information retrieval that attempts to model the underlying structure of term associations by transforming the traditional representation of documents as vectors of weighted term frequencies to a new coordinate space where both documents and term ..."
Abstract
-
Cited by 83 (2 self)
- Add to MetaCart
Latent Semantic Indexing (LSI) is a novel approach to information retrieval that attempts to model the underlying structure of term associations by transforming the traditional representation of documents as vectors of weighted term frequencies to a new coordinate space where both documents and terms are represented as linear combinations of underlying semantic factors. In previous research, LSI has produced a small improvement in retrieval performance. In this paper, we apply LSI to the routing task, which operates under the assumption that a sample of relevant and non-relevant documents is available to use in constructing the query. Once again, LSI slightly improves performance. However, when LSI is used is conduction with statistical classification, there is a dramatic improvement in performance. 1
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 72 (2 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
Computational Methods for Intelligent Information Access
, 1995
"... Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical ..."
Abstract
-
Cited by 59 (0 self)
- Add to MetaCart
Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to...
Dictionary Methods for Cross-Lingual Information Retrieval
- IN PROCEEDINGS OF THE 7TH INTERNATIONAL DEXA CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS
, 1996
"... Multi-lingual information retrieval (IR) has largely been limited to the development of systems for use with a specific foreign language. The explosion in the availability of electronic media in languages other than English makes the development of IR systems that can cross language boundaries incre ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
Multi-lingual information retrieval (IR) has largely been limited to the development of systems for use with a specific foreign language. The explosion in the availability of electronic media in languages other than English makes the development of IR systems that can cross language boundaries increasingly important. In this paper, we present experiments that analyze the factors that affect dictionary based methods for cross-lingual retrieval and present methods that dramatically reduce the errors such an approach usually makes.
Automatic Cross-Linguistic Information Retrieval using Latent Semantic Indexing
, 1997
"... this document as a bag of freely intermingled French and English words. A set of training documents like this is analyzed using LSI, and the result is a reduced dimension semantic space in which related terms are near each other. Because the documents contained both French and English terms, the LS ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
this document as a bag of freely intermingled French and English words. A set of training documents like this is analyzed using LSI, and the result is a reduced dimension semantic space in which related terms are near each other. Because the documents contained both French and English terms, the LSI space will contain terms from both languages; this is what makes it possible for the CL-LSI method to avoid query translation. Words that are consistently paired in translation (e.g., Libya and Libye) will be given identical representations in the LSI space, whereas words that are frequently associated with one another (e.g., not and pas) will be given similar representations. The next step in the CL-LSI method is to add (or "fold in") documents in just French or English. As described above, this is done by locating a new document at the weighted vector sum of its constituent terms. The result of this process is that each document in the database has a language-independent representation in terms of numerical vectors. Users can now pose queries in either French or English and get back the most similar documents regardless of language. 3.2 Experimental Tests

