Results 11 - 20
of
124
Support Vector Learning for Ordinal Regression
- In International Conference on Artificial Neural Networks
, 1999
"... We investigate the problem of predicting variables of ordinal scale. This task is referred to as ordinal regression and is complementary to the standard machine learning tasks of classification and metric regression. In contrast to statistical models we present a distribution independent formulatio ..."
Abstract
-
Cited by 55 (1 self)
- Add to MetaCart
We investigate the problem of predicting variables of ordinal scale. This task is referred to as ordinal regression and is complementary to the standard machine learning tasks of classification and metric regression. In contrast to statistical models we present a distribution independent formulation of the problem together with uniform bounds of the risk functional. The approach presented is based on a mapping from objects to scalar utility values. Similar to Support Vector methods we derive a new learning algorithm for the task of ordinal regression based on large margin rank boundaries. We give experimental results for an information retrieval task: learning the order of documents w.r.t. an initial query. Experimental results indicate that the presented algorithm outperforms more naive approaches to ordinal regression such as Support Vector classification and Support Vector regression in the case of more than two ranks. 1 Introduction Problems of ordinal regression arise in many fi...
Helly-type theorems and generalized linear programming
- Discrete Comput. Geom
, 1994
"... This thesis establishes a connection between the Helly theorems, a collection of results from combinatorial geometry, and the class of problems whichwe call Generalized Linear Programming, or GLP, which can be solved by combinatorial linear programming algorithms like the simplex method. We use thes ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
This thesis establishes a connection between the Helly theorems, a collection of results from combinatorial geometry, and the class of problems whichwe call Generalized Linear Programming, or GLP, which can be solved by combinatorial linear programming algorithms like the simplex method. We use these results to explore the class GLP and show new applications to geometric optimization, and also to prove Helly theorems. In general, a GLP is a set...
Clustering via Concave Minimization
- Advances in Neural Information Processing Systems -9
, 1997
"... The problem of assigning m points in the n-dimensional real space R n to k clusters is formulated as that of determining k centers in R n such that the sum of distances of each point to the nearest center is minimized. If a polyhedral distance is used, the problem can be formulated as that of ..."
Abstract
-
Cited by 47 (17 self)
- Add to MetaCart
The problem of assigning m points in the n-dimensional real space R n to k clusters is formulated as that of determining k centers in R n such that the sum of distances of each point to the nearest center is minimized. If a polyhedral distance is used, the problem can be formulated as that of minimizing a piecewise-linear concave function on a polyhedral set which is shown to be equivalent to a bilinear program: minimizing a bilinear function on a polyhedral set. A fast finite k-Median Algorithm consisting of solving few linear programs in closed form leads to a stationary point of the bilinear program. Computational testing on a number of realworld databases was carried out. On the Wisconsin Diagnostic Breast Cancer (WDBC) database, k-Median training set correctness was comparable to that of the k-Mean Algorithm, however its testing set correctness was better. Additionally, on the Wisconsin Prognostic Breast Cancer (WPBC) database, distinct and clinically important survival curv...
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Duality and Geometry in SVM Classifiers
- In Proc. 17th International Conf. on Machine Learning
, 2000
"... We develop an intuitive geometric interpretation of the standard support vector machine (SVM) for classification of both linearly separable and inseparable data and provide a rigorous derivation of the concepts behind the geometry. For the separable case finding the maximum margin between the ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
We develop an intuitive geometric interpretation of the standard support vector machine (SVM) for classification of both linearly separable and inseparable data and provide a rigorous derivation of the concepts behind the geometry. For the separable case finding the maximum margin between the two sets is equivalent to finding the closest points in the smallest convex sets that contain each class (the convex hulls). We now extend this argument to the inseparable case by using a reduced convex hull reduced away from outliers. We prove that solving the reduced convex hull formulation is exactly equivalent to solving the standard inseparable SVM for appropriate choices of parameters. Some additional advantages of the new formulation are that the e#ect of the choice of parameters becomes geometrically clear and that the formulation may be solved by fast nearest point algorithms. By changing norms these arguments hold for both the standard 2-norm and 1-norm SVM. 1. Int...
Massive Data Discrimination via Linear Support Vector Machines
- Optimization Methods and Software
, 1998
"... A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of su ..."
Abstract
-
Cited by 44 (15 self)
- Add to MetaCart
A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution that leads to a globally optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very l...
Decomposition Algorithms for Stochastic Programming on a Computational Grid
- Computational Optimization and Applications
, 2001
"... . We describe algorithms for two-stage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the L-shaped method and a trust-region method. The parallel platform of choice is the dynamic, heter ..."
Abstract
-
Cited by 44 (9 self)
- Add to MetaCart
. We describe algorithms for two-stage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the L-shaped method and a trust-region method. The parallel platform of choice is the dynamic, heterogeneous, opportunistic platform provided by the Condor system. The algorithms are of master-worker type (with the workers being used to solve second-stage problems), and the MW runtime support library (which supports masterworker computations) is key to the implementation. Computational results are presented on large sample average approximations of problems from the literature. 1.
Mathematical Programming for Data Mining: Formulations and Challenges
- INFORMS Journal on Computing
, 1998
"... This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research ch ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems. keywords: data analysis, data mining, mathematical programming methods, challenges for massive data sets, classification, clustering, prediction, optimization. To appear: INFORMS: Journal of Compting, special issue on Data Mining, A. Basu and B. Golden (guest editors). Also appears as Mathematical Programming Technical Report 98-01, Computer Sciences Department, University of Wi...
Multicategory Classification by Support Vector Machines
- Computational Optimizations and Applications
, 1999
"... We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programm ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machines (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise nonlinear classification function. Each piece of this function can take the form of a polynomial, radial basis function, or even a neural network. For the k > 2 class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach. 1 1
Misclassification Minimization
- JOURNAL OF GLOBAL OPTIMIZATION
, 1994
"... The problem of minimizing the number of misclassified points by a plane, attempting to separate two point sets with intersecting convex hulls in n-dimensional real space, is formulated as a linear program with equilibrium constraints (LPEC). This general LPEC can be converted to an exact penalty pro ..."
Abstract
-
Cited by 38 (13 self)
- Add to MetaCart
The problem of minimizing the number of misclassified points by a plane, attempting to separate two point sets with intersecting convex hulls in n-dimensional real space, is formulated as a linear program with equilibrium constraints (LPEC). This general LPEC can be converted to an exact penalty problem with a quadratic objective and linear constraints. A Frank-Wolfe-type algorithm is proposed for the penalty problem that terminates at a stationary point or a global solution. Novel aspects of the approach include: (i) A linear complementarity formulation of the step function that "counts" misclassifications, (ii) Exact penalty formulation without boundedness, nondegeneracy or constraint qualification assumptions, (iii) An exact solution extraction from the sequence of minimizers of the penalty function for a finite value of the penalty parameter for the general LPEC and an explicitly exact solution for the LPEC with uncoupled constraints, and (iv) A parametric quadratic programming form...

