Results 1  10
of
11
Text Categorization Based on Regularized Linear Classification Methods
 Information Retrieval
, 2000
"... A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of doc ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.
"Is This Document Relevant? ...Probably": A Survey of Probabilistic Models in Information Retrieval
, 2001
"... This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the developmen ..."
Abstract

Cited by 62 (14 self)
 Add to MetaCart
This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described
Probabilistic Information Retrieval as Combination of Abstraction, Inductive Learning and Probabilistic Assumptions
, 1994
"... We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing with a controlled ...
From uncertain inference to probability of relevance for advanced IR applications
 25th European Conference on Information Retrieval Research (ECIR 2003
, 2003
"... Abstract. Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than adhoc retrieval, estimates of the actual probability of relevance are required. In this pape ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Abstract. Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than adhoc retrieval, estimates of the actual probability of relevance are required. In this paper, we investigate mapping functions between these two types of probability. For this purpose, we consider linear and logistic functions. The former have been proposed before, whereas we give a new theoretic justification for the latter. In a series of upperbound experiments, we compare the goodness of fit of the two models. A second series of experiments investigates the effect on the resulting retrieval quality in the fusion step of distributed retrieval. These experiments show that good estimates of the actual probability of relevance can be achieved, and the logistic model outperforms the linear one. However, retrieval quality for distributed retrieval (only merging, without resource selection) is only slightly improved by using the logistic function. 1
From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications
 Information Retrieval
, 2003
"... this paper, we explore the use of linear and logistic mapping functions for different retrieval methods. In a series of upperbound experiments, we compare the approximation quality of the different mapping functions. We also investigate the effect on the resulting retrieval quality in distribute ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
this paper, we explore the use of linear and logistic mapping functions for different retrieval methods. In a series of upperbound experiments, we compare the approximation quality of the different mapping functions. We also investigate the effect on the resulting retrieval quality in distributed retrieval (only merging, without resource selection) . These experiments show that good estimates of the actual probability of relevance can be achieved, and that the logistic model outperforms the linear one. Retrieval quality for distributed retrieval is only slightly improved by using the logistic function
Automatic Indexing in Operation: The RuleBased System AIR/X for Large Subject Fields
, 1993
"... AIR/X is a rulebased system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of termdescriptor rules. We describ ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
AIR/X is a rulebased system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of termdescriptor rules. We describe the different stages and the overall architecture of the system. Then we present a specific application, the AIR/PHYS system developed for a large physics database. We illustrate the system by giving a detailed example and present experimental results for different system parameter settings. 1 Introduction The AIR/X system described in this paper performs an automatic indexing with index terms (called descriptors here) from a controlled vocabulary. The texts to be indexed are abstracts written in English. The indexing process consists of several stages, with specific rule bases involved in each stage. In order to cope with large subject fields, appropriate rule bases have to be developed....
PIRE: An extensible IR engine based on probabilistic
"... Abstract. This paper introduces PIRE, a probabilistic IR engine. For both document indexing and retrieval, PIRE makes heavy use of probabilistic Datalog, a probabilistic extension of predicate Horn logics. Using such a logical framework together with probability theory allows for defining and using ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. This paper introduces PIRE, a probabilistic IR engine. For both document indexing and retrieval, PIRE makes heavy use of probabilistic Datalog, a probabilistic extension of predicate Horn logics. Using such a logical framework together with probability theory allows for defining and using data types (e.g. text, names, numbers), different weighting schemes (e.g. normalised tf, tf.idf or BM25) and retrieval functions (e.g. uncertain inference, language models). Extending the system thus is reduced to adding new rules. Furthermore, this logical framework provide a powerful tool for including additional background knowledge into the retrieval process. 1
Representations, Models and Abstractions in Probabilistic Information Retrieval
 Proceedings of the 6th Annual Meeting on Information and Classification
, 1993
"... We show that most approaches in probabilistic information retrieval can be regarded as a combination of the three concepts representation, model and abstraction. First, documents and queries have to be represented in a certain form, e.g. as a sets of terms. Probabilistic models use certain assump ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We show that most approaches in probabilistic information retrieval can be regarded as a combination of the three concepts representation, model and abstraction. First, documents and queries have to be represented in a certain form, e.g. as a sets of terms. Probabilistic models use certain assumptions about the distribution of the elements of the representation in relevant and nonrelevant documents in order to estimate the probability of relevance of a document w.r.t.
Ist Rd Project
"... This is the description of the MIND resource selection framework. It extends the decision (for dissemination) theoretic framework formerly developed by UNIDO by using a logistic function for modelling the relationship between score and probability of relevance, and by approximating the indexing w ..."
Abstract
 Add to MetaCart
This is the description of the MIND resource selection framework. It extends the decision (for dissemination) theoretic framework formerly developed by UNIDO by using a logistic function for modelling the relationship between score and probability of relevance, and by approximating the indexing weights (and, as a consequence, the document scores) by a normal distribution.