Results 1  10
of
34
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Proximal support vector machine classifiers
 Proceedings KDD2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract

Cited by 109 (14 self)
 Add to MetaCart
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
Boosting as a Regularized Path to a Maximum Margin Classifier
 Journal of Machine Learning Research
, 2004
"... In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l 1 constraint on the coefficient vector. This helps understand the success of boosting with ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l 1 constraint on the coefficient vector. This helps understand the success of boosting with early stopping as regularized fitting of the loss criterion. For the two most commonly used criteria (exponential and binomial loglikelihood), we further show that as the constraint is relaxedor equivalently as the boosting iterations proceedthe solution converges (in the separable case) to an "l 1 optimal" separating hyperplane. We prove that this l 1 optimal separating hyperplane has the property of maximizing the minimal l 1 margin of the training data, as defined in the boosting literature.
A Feature Selection Newton Method for Support Vector Machine Classification
 Computational Optimization and Applications
, 2002
"... A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed standalone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifie ..."
Abstract

Cited by 51 (3 self)
 Add to MetaCart
A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed standalone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it gener ates.
Massive Data Discrimination via Linear Support Vector Machines
 Optimization Methods and Software
, 1998
"... A linear support vector machine formulation is used to generate a fast, finitelyterminating linearprogramming algorithm for discriminating between two massive sets in ndimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of su ..."
Abstract

Cited by 48 (16 self)
 Add to MetaCart
A linear support vector machine formulation is used to generate a fast, finitelyterminating linearprogramming algorithm for discriminating between two massive sets in ndimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution that leads to a globally optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32dimensional space, confirm the theoretical results and demonstrate the ability to handle very l...
Mathematical Programming for Data Mining: Formulations and Challenges
 INFORMS Journal on Computing
, 1998
"... This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research ch ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems. keywords: data analysis, data mining, mathematical programming methods, challenges for massive data sets, classification, clustering, prediction, optimization. To appear: INFORMS: Journal of Compting, special issue on Data Mining, A. Basu and B. Golden (guest editors). Also appears as Mathematical Programming Technical Report 9801, Computer Sciences Department, University of Wi...
kPlane Clustering
 Journal of Global Optimization
, 2000
"... A finite new algorithm is proposed for clustering m given points in ndimensional real space into k clusters by generating k planes that constitute a local solution to the nonconvex problem of minimizing the sum of squares of the 2norm distances between each point and a nearest plane. The key to th ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
A finite new algorithm is proposed for clustering m given points in ndimensional real space into k clusters by generating k planes that constitute a local solution to the nonconvex problem of minimizing the sum of squares of the 2norm distances between each point and a nearest plane. The key to the algorithm lies in a formulation that generates a plane in ndimensional space that minimizes the sum of the squares of the 2norm distances to each of m1 given points in the space. The plane is generated by an eigenvector corresponding to a smallest eigenvalue of an n \Theta n simple matrix derived from the m1 points. The algorithm was tested on the publicly available Wisconsin Breast Prognosis Cancer database to generate well separated patient survival curves. In contrast, the kmean algorithm did not generate such wellseparated survival curves. 1 Introduction There are many approaches to clustering such as statistical [2, 9, 6], machine learning [7, 8] and mathematical programming [15...
Constructing Boosting Algorithms from SVMs: An Application to Oneclass Classification
, 2002
"... ..."
Classification on proximity data with lp–machines
, 1999
"... We provide a new linear program to deal with classification of data in the case of functions written in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in Support Vector Machines, since the notion of a margin is purely needed i ..."
Abstract

Cited by 37 (10 self)
 Add to MetaCart
We provide a new linear program to deal with classification of data in the case of functions written in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in Support Vector Machines, since the notion of a margin is purely needed in input space where the classification actually occurs. Moreover in our approach we can enforce sparsity in the proximity representation by sacrificing training error. This turns out to be favorable for proximity data. Similar to –SV methods, the only parameter needed in the algorithm is the (asymptotical) number of data points being classified with a margin. Finally, the algorithm is successfully compared with –SV learning in proximity space and K–nearestneighbors on real world data from Neuroscience and molecular biology. 1
Efficient Margin Maximizing with Boosting
, 2002
"... AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, whic ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, which can be viewed as hyperplane in feature space where the base hypotheses form the features.