Results 1 
6 of
6
Text Categorization Based on Regularized Linear Classification Methods
 Information Retrieval
, 2000
"... A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.
Hierarchical Text Categorization Using Neural Networks
 Information Retrieval
, 2002
"... This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchi ..."
Abstract

Cited by 72 (0 self)
 Add to MetaCart
This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as at neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent at model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.
Robustness of Regularized Linear Classification Methods In Text Categorization
, 2003
"... Realworld applications often require the classi cation of documents under situations of small number of features, mislabeled documents and rare positive examples. This paper investigates the robustness of three regularized linear classi cation methods (SVM, ridge regression and logistic regressi ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
Realworld applications often require the classi cation of documents under situations of small number of features, mislabeled documents and rare positive examples. This paper investigates the robustness of three regularized linear classi cation methods (SVM, ridge regression and logistic regression) under above situations. We compare these methods in terms of their loss functions and score distributions, and establish the connection between their optimization problems and generalization error bounds. Several sets of controlled experiments on the Reuters21578 corpus are conducted to investigate the robustness of these methods. Our results show that ridge regression seems to be the most promising candidate for rare class problems.
DSO at TREC8: A Hybrid Algorithm for the Routing Task
, 1999
"... In this paper, we describe a new hybrid algorithm that we used for the routing task at TREC8. The algorithm combines the use of Rocchio's formula for term selection, and an improved variant of the perceptron learning algorithm for tuning the term weights. This algorithm is able to give good perf ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper, we describe a new hybrid algorithm that we used for the routing task at TREC8. The algorithm combines the use of Rocchio's formula for term selection, and an improved variant of the perceptron learning algorithm for tuning the term weights. This algorithm is able to give good performance on TREC8 test data. We also achieved a slight improvement in average uninterpolated precision by using Dynamic Feedback Optimization (DFO) as another weight tuning algorithm and combining the ranked list generated by DFO with that of perceptron. 1 Introduction DSO is a rsttime participant in TREC. We only participated in the routing task at the TREC8 ltering track. Broadly speaking, there are two popular approaches to the routing task. The rst approach uses the Rocchio algorithm (Rocchio, 1971), and has its root in the information retrieval community. Recently, a number of extensions have been made to this approach. These include the use of better document representation ...
Training ContextInsensitive versus ContextSensitive Text Classifiers using Small Data Sets
, 1998
"... Recent studies of supervised learning algorithms for text classiers employ (very) large data sets for training and evaluation. Nevertheless, there are many important situations that cannot be faithfully represented by such large data sets and are characterized mainly by inherently small training s ..."
Abstract
 Add to MetaCart
Recent studies of supervised learning algorithms for text classiers employ (very) large data sets for training and evaluation. Nevertheless, there are many important situations that cannot be faithfully represented by such large data sets and are characterized mainly by inherently small training sets. In this paper we focus on three such \small" binary classication tasks: Topical Web pages classi cation, internet newsgroup classication and identication of authorship. For these tasks we study and compare two important families of learning algorithms: several variants of the classical Bayesian Probability Ratio Test (PRT) method and several variants of the recent sleeping experts (SE) algorithm. Among the SE variants are contextsensitive algorithms that attempt to model a text using higher order statistics of the words. Our results indicate that for these three test cases our contextinsensitive algorithms perform su ciently well for practical applications. Moreover, thes...
Dimensionality Reduction through Correspondence Analysis
, 1999
"... Many learning algorithms make an implicit assumption that all the attributes of the presented data are relevant to a learning task. However, several studies on attribute selection have demonstrated that this assumption rarely holds. In addition, for many supervised learning algorithms such as neares ..."
Abstract
 Add to MetaCart
Many learning algorithms make an implicit assumption that all the attributes of the presented data are relevant to a learning task. However, several studies on attribute selection have demonstrated that this assumption rarely holds. In addition, for many supervised learning algorithms such as nearest neighbour algorithms, the inclusion of irrelevant attributes can result in a degradation in the classification accuracy of the learning algorithm. Whilst a number of different methods for attribute selection exist, many of these are only appropriate for datasets which contain a small number of attributes (e.g. < 20). This paper presents an alternative approach to attribute selection, which can be applied to datasets with a greater number of attributes. We present an evaluation of the approach which contrasts its performance with one other attribute selection technique.