Results 1 - 10
of
20
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vec ..."
Abstract
-
Cited by 380 (5 self)
- Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine' (RVM), a model of identical functional form to the popular and state-of-the-art `support vector machine' (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while oering a number of additional advantages. These include the benets of probabilistic predictions, automatic estimation of `nuisance' parameters, and the facility to utilise arbitrary basis functions (e.g. non-`Mercer' kernels).
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract
-
Cited by 309 (1 self)
- Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Dimensionality Reduction via Sparse Support Vector Machines
- Journal of Machine Learning Research
, 2003
"... We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to prod ..."
Abstract
-
Cited by 45 (12 self)
- Add to MetaCart
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with # 1 -norm regularization inherently performs variable selection as a side-e#ect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the e#ects of variables.
Analysis of Sparse Bayesian Learning
- Advances in Neural Information Processing Systems 14
, 2001
"... The recent introduction of the `relevance vector machine' has eectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, `learning' is the maximisation, with respect to hyperparameters, of the ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
The recent introduction of the `relevance vector machine' has eectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, `learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived `sparsity criterion' is satis ed, this maximum is exactly equivalent to `pruning' the corresponding parameter from the model.
A sparse support vector machine approach to region-based image categorization
- In CVPR ’05
, 2005
"... Automatic image categorization using low-level features is a challenging research topic in computer vision. In this paper, we formulate the image categorization problem as a multiple-instance learning (MIL) problem by viewing an image as a bag of instances, each corresponding to a region obtained fr ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Automatic image categorization using low-level features is a challenging research topic in computer vision. In this paper, we formulate the image categorization problem as a multiple-instance learning (MIL) problem by viewing an image as a bag of instances, each corresponding to a region obtained from image segmentation. We propose a new solution to the resulting MIL problem. Unlike many existing MIL approaches that rely on the diverse density framework, our approach performs an effective feature mapping through a chosen metric distance function. Thus the MIL problem becomes solvable by a regular classification algorithm. Sparse SVM is adopted to dramatically reduce the regions that are needed to classify images. The selected regions by a sparse SVM approximate to the target concepts in the traditional diverse density framework. The proposed approach is a lot more efficient in computation and less sensitive to the class label uncertainty. Experimental results are included to demonstrate the effectiveness and robustness of the proposed method. 1.
Barrier Boosting
"... Boosting algorithms like AdaBoost and Arc-GV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms. ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Boosting algorithms like AdaBoost and Arc-GV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms.
Support vector machine based multi-view face detection and recognition
, 2004
"... Detecting faces across multiple views is more challenging than in a fixed view, e.g. frontal view, owing to the significant non-linear variation caused by rotation in depth, self-occlusion and self-shadowing. To address this problem, a novel approach is presented in this paper. The view sphere is se ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Detecting faces across multiple views is more challenging than in a fixed view, e.g. frontal view, owing to the significant non-linear variation caused by rotation in depth, self-occlusion and self-shadowing. To address this problem, a novel approach is presented in this paper. The view sphere is separated into several small segments. On each segment, a face detector is constructed. We explicitly estimate the pose of an image regardless of whether or not it is a face. A pose estimator is constructed using Support Vector Regression. The pose information is used to choose the appropriate face detector to determine if it is a face. With this pose-estimation based method, considerable computational efficiency is achieved. Meanwhile, the detection accuracy is also improved since each detector is constructed on a small range of views. We developed a novel algorithm for face detection by combining the Eigenface and SVM methods which performs almost as fast as the Eigenface method but with a significant improved speed. Detailed experimental results are presented in this paper including tuning the parameters of the pose estimators and face detectors, performance evaluation, and applications to video based face detection and frontal-view face recognition.
Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces
, 2000
"... We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina- tions of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible h ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina- tions of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesisspace problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational resultsshow that these methods are extremely promising.
Massive Support Vector Regression
- Data Mining Institute, Computer Sciences Department, University of Wisconsin
, 1999
"... The problem of tolerant data fitting by a nonlinear surface, induced by a kernel-based support vector machine [19], is formulated as a linear program with fewer number of variables than that of other linear programming formulations [17]. A generalization of the linear programming chunking algorithm ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
The problem of tolerant data fitting by a nonlinear surface, induced by a kernel-based support vector machine [19], is formulated as a linear program with fewer number of variables than that of other linear programming formulations [17]. A generalization of the linear programming chunking algorithm [1] for arbitrary kernels [10] is implemented for solving problems with very large datasets wherein chunking is performed on both data points and problem variables. The proposed approach tolerates a small error, which is adjusted parametrically, while fitting the given data. This leads to improved fitting of noisy data as demonstrated computationally. Comparative numerical results indicate an average time reduction as high as 26.0%, with a maximal time reduction of 79.7%. Additionally, linear programs with as many as 16,000 data points and more than a billion nonzero matrix elements are solved. 1 Introduction Tolerating a small error in fitting a given set of data, i.e. disregarding errors ...
Data Mining Via Mathematical Programming And Machine Learning
, 2000
"... This work explores solving large-scale data mining problems through the use of mathe- matical programming methods. In particular, algorithms are proposed for the support vector machine (SVM) classification problem, which consists of constructing a separating surface that can discriminate between poi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This work explores solving large-scale data mining problems through the use of mathe- matical programming methods. In particular, algorithms are proposed for the support vector machine (SVM) classification problem, which consists of constructing a separating surface that can discriminate between points from one of two classes. An algorithm based on successive overrelaxation (SOR) is presented which can process very large datasets that need not reside in memory. Concepts from generalized SVMs are combined with SOR and with linear programming to find nonlinear separating surfaces. An "active set" strategy is used to generate a fast algorithm that consists of solving a finite num- ber of linear equations of the order of the dimensionality of the original input space at each step. This ASVM active set algorithm requires no specialized quadratic or linear programming code, but merely a linear equation solver which is publicly available. An implicit Lagrangian for the dual of an SVM is used to lead to the simple linearly conver- gent Lagrangian SVM (LSVM) algorithm. LSVM requires the inversion at the outset of a single (typically small) matrix, and the full algorithm is given in 11 lines of MATLAB code.

