Results 1  10
of
70
Machine Learning in Automated Text Categorization
 ACM Computing Surveys
, 2002
"... The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this p ..."
Abstract

Cited by 1090 (20 self)
 Add to MetaCart
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.
A Probabilistic Model of Information Retrieval: Development and Status
, 1998
"... The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Eac ..."
Abstract

Cited by 263 (19 self)
 Add to MetaCart
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations.
A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems
 ACM Transactions on Information Systems
, 1994
"... We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression ..."
Abstract

Cited by 173 (30 self)
 Add to MetaCart
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always confirm to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modelled. As an important extension, we introduce the concept of vague predicates which yields a probabilistic weight instead of a Boolean value, thus allowing for queries with vague selection conditions. So PRA implements uncertainty and vagueness in combination with the...
Automatic Combination of Multiple Ranked Retrieval Systems
, 1994
"... Retrieval performance can often be improved significantly by using a number of different retrieval algorithms and combining the results, in contrast to using just a single retrieval algorithm. This is because different retrieval algorithms, or retrieval experts, often emphasize different document an ..."
Abstract

Cited by 141 (5 self)
 Add to MetaCart
Retrieval performance can often be improved significantly by using a number of different retrieval algorithms and combining the results, in contrast to using just a single retrieval algorithm. This is because different retrieval algorithms, or retrieval experts, often emphasize different document and query features when determining relevance and therefore retrieve different sets of documents. However, it is unclear how the different experts are to be combined, in general, to yield a superior overall estimate. We propose a method by which the relevance estimates made by different experts can be automatically combined to result in superior retrieval performance. We apply the method to two expert combination tasks. The applications demonstrate that the method can identify high performance combinations of experts and also is a novel means for determining the combined effectiveness of experts. 1 Introduction In text retrieval, two heads are definitely better than one. Retrieval performanc...
Probabilistic Models in Information Retrieval
 The Computer Journal
, 1992
"... In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR alon ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the probabilistic parameters involved. For the estimation of these parameters, three different learning strategies are distinguished, namely queryrelated, documentrelated and descriptionrelated learning. As a representative for each of these strategies, a specific model is described. A new approach regards IR as uncertain inference; here, imaging is used as a new technique for estimating the probabilistic parameters, and probabilistic inference networks support more complex forms of inference. Finally, the more general problems of parameter estimation, query expansion and the development of models for advanced document representations are discussed.
COMBINING APPROACHES TO INFORMATION RETRIEVAL
"... The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “metasearch” engines used on the W ..."
Abstract

Cited by 87 (2 self)
 Add to MetaCart
The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “metasearch” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.
"Is This Document Relevant? ...Probably": A Survey of Probabilistic Models in Information Retrieval
, 2001
"... This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the developmen ..."
Abstract

Cited by 63 (14 self)
 Add to MetaCart
This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described
A Risk Minimization Framework for Information Retrieval
 IN PROCEEDINGS OF THE ACM SIGIR 2003 WORKSHOP ON MATHEMATICAL/FORMAL METHODS IN IR. ACM
, 2003
"... This paper presents a novel probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), user preference ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
This paper presents a novel probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate the systematic development of new retrieval models. As an example of using the framework to model nontraditional retrieval problems, we derive new retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they go beyond independent topical relevance.
A model of multimedia information retrieval
 Journal of the ACM
, 2001
"... Abstract. Research on multimedia information retrieval (MIR) has recently witnessed a booming interest. A prominent feature of this research trend is its simultaneous but independent materialization within several fields of computer science. The resulting richness of paradigms, methods and systems m ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
Abstract. Research on multimedia information retrieval (MIR) has recently witnessed a booming interest. A prominent feature of this research trend is its simultaneous but independent materialization within several fields of computer science. The resulting richness of paradigms, methods and systems may, on the long run, result in a fragmentation of efforts and slow down progress. The primary goal of this study is to promote an integration of methods and techniques for MIR by contributing a conceptual model that encompasses in a unified and coherent perspective the many efforts that are being produced under the label of MIR. The model offers a retrieval capability that spans two media, text and images, but also several dimensions: form, content and structure. In this way, it reconciles similaritybased methods with semanticsbased ones, providing the guidelines for the design of systems that are able to provide a generalized multimedia retrieval service, in which the existing forms of retrieval not only coexist, but can be combined in any desired manner. The model is formulated in terms of a fuzzy description logic, which plays a twofold role: (1) it directly models semanticsbased retrieval, and (2) it offers an ideal framework for the integration of the multimedia and multidimensional aspects of retrieval mentioned above. The model also accounts for relevance feedback in both text and image retrieval, integrating known techniques for taking into account user judgments. The implementation of
Probability kinematics in information retrieval
 ACM Transactions on Information Systems
, 1995
"... We analyse the kinematics of probabilistic term weights at retrieval time for di erent Information Retrieval models. We present four models based on di erent notions of probabilistic retrieval. Two of these models are based on classical probability theory and can be considered as prototypes of model ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
We analyse the kinematics of probabilistic term weights at retrieval time for di erent Information Retrieval models. We present four models based on di erent notions of probabilistic retrieval. Two of these models are based on classical probability theory and can be considered as prototypes of models long in use in Information Retrieval, like the Vector Space Model and the Probabilistic Model. The two other models are based on a logical technique of evaluating the probability of a conditional called imaging, one is a generalisation of the other. We analyse the transfer of probabilities occurring in the term space at retrieval time for these four models, compare their retrieval performance using classical test collections, and discuss the results. We believe that our results provide useful suggestions on how to improve existing probabilistic models of Information Retrieval by taking into consideration termterm similarity.