Results 1  10
of
50
Transductive Inference for Text Classification using Support Vector Machines
, 1999
"... This paper introduces Transductive Support Vector Machines (TSVMs) for text classification. While regular Support Vector Machines (SVMs) try to induce a general decision function for a learning task, Transductive Support Vector Machines take into account a particular test set and try to minimiz ..."
Abstract

Cited by 682 (4 self)
 Add to MetaCart
This paper introduces Transductive Support Vector Machines (TSVMs) for text classification. While regular Support Vector Machines (SVMs) try to induce a general decision function for a learning task, Transductive Support Vector Machines take into account a particular test set and try to minimize misclassifications of just those particular examples. The paper presents an analysis of why TSVMs are well suited for text classification. These theoretical findings are supported by experiments on three test collections. The experiments show substantial improvements over inductive methods, especially for small training sets, cutting the number of labeled training examples down to a twentieth on some tasks. This work also proposes an algorithm for training TSVMs efficiently, handling 10,000 examples and more.
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 473 (2 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Soft Margins for AdaBoost
, 1998
"... Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this ..."
Abstract

Cited by 256 (22 self)
 Add to MetaCart
Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this fact is the margin distribution and we find that AdaBoost achieves  doing gradient descent in an error function with respect to the margin  asymptotically a hard margin distribution, i.e. the algorithm concentrates its resources on a few hardtolearn patterns (here an interesting overlap emerge to Support Vectors). This is clearly a suboptimal strategy in the noisy case, and regularization, i.e. a mistrust in the data, must be introduced in the algorithm to alleviate the distortions that a difficult pattern (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin  a ...
Transductive Learning via Spectral Graph Partitioning
 In ICML
, 2003
"... We present a new method for transductive learning, which can be seen as a transductive version of the k nearestneighbor classifier. ..."
Abstract

Cited by 195 (0 self)
 Add to MetaCart
We present a new method for transductive learning, which can be seen as a transductive version of the k nearestneighbor classifier.
Linear programming boosting via column generation
 Machine Learning
, 2002
"... 1 Introduction Recent papers [20] have shown that boosting, arcing, and related ensemble methods (hereafter summarized asboosting) can be viewed as margin maximization in function space. By changing the cost function, different ..."
Abstract

Cited by 101 (3 self)
 Add to MetaCart
1 Introduction Recent papers [20] have shown that boosting, arcing, and related ensemble methods (hereafter summarized asboosting) can be viewed as margin maximization in function space. By changing the cost function, different
Dimensionality Reduction via Sparse Support Vector Machines
 Journal of Machine Learning Research
, 2003
"... We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to prod ..."
Abstract

Cited by 67 (13 self)
 Add to MetaCart
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with # 1 norm regularization inherently performs variable selection as a sidee#ect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the e#ects of variables.
Massive Data Discrimination via Linear Support Vector Machines
 Optimization Methods and Software
, 1998
"... A linear support vector machine formulation is used to generate a fast, finitelyterminating linearprogramming algorithm for discriminating between two massive sets in ndimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of su ..."
Abstract

Cited by 48 (16 self)
 Add to MetaCart
A linear support vector machine formulation is used to generate a fast, finitelyterminating linearprogramming algorithm for discriminating between two massive sets in ndimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution that leads to a globally optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32dimensional space, confirm the theoretical results and demonstrate the ability to handle very l...
Classification on proximity data with lp–machines
, 1999
"... We provide a new linear program to deal with classification of data in the case of functions written in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in Support Vector Machines, since the notion of a margin is purely needed i ..."
Abstract

Cited by 37 (10 self)
 Add to MetaCart
We provide a new linear program to deal with classification of data in the case of functions written in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in Support Vector Machines, since the notion of a margin is purely needed in input space where the classification actually occurs. Moreover in our approach we can enforce sparsity in the proximity representation by sacrificing training error. This turns out to be favorable for proximity data. Similar to –SV methods, the only parameter needed in the algorithm is the (asymptotical) number of data points being classified with a margin. Finally, the algorithm is successfully compared with –SV learning in proximity space and K–nearestneighbors on real world data from Neuroscience and molecular biology. 1
CoEM Support Vector Learning
 In Proceedings of the International Conference on Machine Learning
, 2004
"... Multiview algorithms, such as cotraining and coEM, utilize unlabeled data when the available attributes can be split into independent and compatible subsets. CoEM outperforms cotraining for many problems, but it requires the underlying learner to estimate class probabilities, and to learn ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
Multiview algorithms, such as cotraining and coEM, utilize unlabeled data when the available attributes can be split into independent and compatible subsets. CoEM outperforms cotraining for many problems, but it requires the underlying learner to estimate class probabilities, and to learn from probabilistically labeled data. Therefore, coEM has so far only been studied with naive Bayesian learners. We cast linear classifiers into a probabilistic framework and develop a coEM version of the Support Vector Machine.
A Column Generation Algorithm For Boosting
, 2000
"... We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation simplex method. We prove that minimizing the soft margin error function (equivalent to solving an LP) directly optimizes a general ization error bound. LPBoost can ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation simplex method. We prove that minimizing the soft margin error function (equivalent to solving an LP) directly optimizes a general ization error bound. LPBoost can be used to solve any boosting LP by iteratively optimizing the dual classification costs in a restricted LP and dynamically generating weak learners to make new LP columns. Unlike gradient boosting algorithms, LPBoost converges rinkely to a global solution using well defined stopping criteria. Computationally, LPBoost finds very sparse solutions as good as or better than those found by ADABoost using comparable computation.