Results 1 - 10
of
64
Latent semantic indexing: A probabilistic analysis
, 1998
"... Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underl ..."
Abstract
-
Cited by 210 (8 self)
- Add to MetaCart
Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.
A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems
- ACM Transactions on Information Systems
, 1994
"... We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression ..."
Abstract
-
Cited by 149 (28 self)
- Add to MetaCart
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always confirm to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modelled. As an important extension, we introduce the concept of vague predicates which yields a probabilistic weight instead of a Boolean value, thus allowing for queries with vague selection conditions. So PRA implements uncertainty and vagueness in combination with the...
COMBINING APPROACHES TO INFORMATION RETRIEVAL
"... The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the W ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.
"Is This Document Relevant? ...Probably": A Survey of Probabilistic Models in Information Retrieval
, 2001
"... This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the developmen ..."
Abstract
-
Cited by 55 (12 self)
- Add to MetaCart
This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described
Probabilistic Relevance Models Based on Document and Query Generation
- Language Modeling and Information Retrieval
, 2002
"... We give a uni ed account of the probabilistic semantics underlying the language modeling approach and the traditional probabilistic model for information retrieval, showing that the two approaches can be viewed as being equivalent probabilistically, since they are based on dierent factorizations ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
We give a uni ed account of the probabilistic semantics underlying the language modeling approach and the traditional probabilistic model for information retrieval, showing that the two approaches can be viewed as being equivalent probabilistically, since they are based on dierent factorizations of the same generative relevance model. We also discuss how the two approaches lead to dierent retrieval frameworks in practice, since they involve component models that are estimated quite dierently.
Document and Passage Retrieval Based on Hidden Markov Models
- In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1994
"... Introduced is a new approach to Information Retrieval developed on the bais of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents--documents with predefined boundaries and also entities of information that are of arbitrary lengths and ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Introduced is a new approach to Information Retrieval developed on the bais of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents--documents with predefined boundaries and also entities of information that are of arbitrary lengths and formats (passage retrieval). Our retrieval model is shown to encompass promising capabilities: First, the position of occurrences of indexing features can be used for indexing. Posi- tional information is essential, for instance, when considering phrases, negation, and the proximity of features. Second, from training collections we can derive automatically optimal weights for arbitrary features. Third, a query dependent structure can be determined for every document by segmenting the documents into passages that axe either relevant or irrelevant to the query. The theoretical analysis of our retrieval model is complemented by the results of pre]imlnaxy experiments.
A Formal Study of Information Retrieval Heuristics
- SIGIR '04
, 2004
"... Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval perform ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.
Dempster-Shafer's Theory of Evidence applied to Structured Documents: Modelling Uncertainty
- SIGIR '97
, 1997
"... Documents often display a structure determined by the author, e.g., several chapters, each with several sub-chapters and so on. Taking into account the structure of a document allows the retrieval process to focus on those parts of the documents that are most relevant to an information need. Chiaram ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
Documents often display a structure determined by the author, e.g., several chapters, each with several sub-chapters and so on. Taking into account the structure of a document allows the retrieval process to focus on those parts of the documents that are most relevant to an information need. Chiaramella et al advanced a model for indexing and retrieving structured documents. Their aim was to express the model within a framework based on formal logics with associated theories. They developed the logical formalism of the model. This paper adds to this model a theory of uncertainty, the Dempster-Shafer theory of evidence. It is shown that the theory provides a rule, the Dempster's combination rule, that allows the expression of the uncertainty with respect to parts of a document, and that is compatible with the logical model developed by Chiaramella et al.
Probabilistic Datalog: Implementing Logical Information Retrieval for Advanced Applications
- Journal of the American Society for Information Science
, 1999
"... In the logical approach to information retrieval (IR), retrieval is considered as uncertain inference. ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
In the logical approach to information retrieval (IR), retrieval is considered as uncertain inference.

