Results 1  10
of
28
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vec ..."
Abstract

Cited by 552 (5 self)
 Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine' (RVM), a model of identical functional form to the popular and stateoftheart `support vector machine' (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while oering a number of additional advantages. These include the benets of probabilistic predictions, automatic estimation of `nuisance' parameters, and the facility to utilise arbitrary basis functions (e.g. non`Mercer' kernels).
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 473 (2 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Dimensionality Reduction via Sparse Support Vector Machines
 Journal of Machine Learning Research
, 2003
"... We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to prod ..."
Abstract

Cited by 67 (13 self)
 Add to MetaCart
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with # 1 norm regularization inherently performs variable selection as a sidee#ect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the e#ects of variables.
Analysis of Sparse Bayesian Learning
 Advances in Neural Information Processing Systems 14
, 2001
"... The recent introduction of the `relevance vector machine' has eectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, `learning' is the maximisation, with respect to hyperparameters, of the ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
The recent introduction of the `relevance vector machine' has eectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, `learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived `sparsity criterion' is satis ed, this maximum is exactly equivalent to `pruning' the corresponding parameter from the model.
A sparse support vector machine approach to regionbased image categorization
 In CVPR ’05
, 2005
"... Automatic image categorization using lowlevel features is a challenging research topic in computer vision. In this paper, we formulate the image categorization problem as a multipleinstance learning (MIL) problem by viewing an image as a bag of instances, each corresponding to a region obtained fr ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Automatic image categorization using lowlevel features is a challenging research topic in computer vision. In this paper, we formulate the image categorization problem as a multipleinstance learning (MIL) problem by viewing an image as a bag of instances, each corresponding to a region obtained from image segmentation. We propose a new solution to the resulting MIL problem. Unlike many existing MIL approaches that rely on the diverse density framework, our approach performs an effective feature mapping through a chosen metric distance function. Thus the MIL problem becomes solvable by a regular classification algorithm. Sparse SVM is adopted to dramatically reduce the regions that are needed to classify images. The selected regions by a sparse SVM approximate to the target concepts in the traditional diverse density framework. The proposed approach is a lot more efficient in computation and less sensitive to the class label uncertainty. Experimental results are included to demonstrate the effectiveness and robustness of the proposed method. 1.
Barrier Boosting
"... Boosting algorithms like AdaBoost and ArcGV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms. ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
Boosting algorithms like AdaBoost and ArcGV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms.
Support vector machine based multiview face detection and recognition
, 2004
"... Detecting faces across multiple views is more challenging than in a fixed view, e.g. frontal view, owing to the significant nonlinear variation caused by rotation in depth, selfocclusion and selfshadowing. To address this problem, a novel approach is presented in this paper. The view sphere is se ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Detecting faces across multiple views is more challenging than in a fixed view, e.g. frontal view, owing to the significant nonlinear variation caused by rotation in depth, selfocclusion and selfshadowing. To address this problem, a novel approach is presented in this paper. The view sphere is separated into several small segments. On each segment, a face detector is constructed. We explicitly estimate the pose of an image regardless of whether or not it is a face. A pose estimator is constructed using Support Vector Regression. The pose information is used to choose the appropriate face detector to determine if it is a face. With this poseestimation based method, considerable computational efficiency is achieved. Meanwhile, the detection accuracy is also improved since each detector is constructed on a small range of views. We developed a novel algorithm for face detection by combining the Eigenface and SVM methods which performs almost as fast as the Eigenface method but with a significant improved speed. Detailed experimental results are presented in this paper including tuning the parameters of the pose estimators and face detectors, performance evaluation, and applications to video based face detection and frontalview face recognition.
Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces
, 2000
"... We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina tions of base hypotheses generated by some boostingtype base learning algorithm. Unlike the classification case, for regression the set of possible h ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina tions of base hypotheses generated by some boostingtype base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semiinfinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesisspace problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplextype algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational resultsshow that these methods are extremely promising.
Large scale kernel regression via linear programming
 Machine Learning
"... The problem of tolerant data fitting by a nonlinear surface, induced by a kernelbased support vector machine [24], is formulated as a linear program with fewer number of variables than that of other linear programming formulations [21]. A generalization of the linear programming chunking algorithm ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
The problem of tolerant data fitting by a nonlinear surface, induced by a kernelbased support vector machine [24], is formulated as a linear program with fewer number of variables than that of other linear programming formulations [21]. A generalization of the linear programming chunking algorithm [2] for arbitrary kernels [13] is implemented for solving problems with very large datasets wherein chunking is performed on both data points and problem variables. The proposed approach tolerates a small error, which is adjusted parametrically, while fitting the given data. This leads to improved fitting of noisy data (over ordinary least error solutions) as demonstrated computationally. Comparative numerical results indicate an average time reduction as high as 26.0 % over other formulations, with a maximal time reduction of 79.7%. Additionally, linear programs with as many as 16,000 data points and more than a billion nonzero matrix elements are solved.
Linear Dependency Between ε and the Input Noise in εSupport Vector Regression
, 2003
"... In using the εsupport vector regression (εSVR) algorithm, one has to decide a suitable value for the insensitivity parameter ε. Smola et al. considered its “optimal” choice by studying the statistical efficiency in a location parameter estimation problem. While they successfully predicted a linear ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
In using the εsupport vector regression (εSVR) algorithm, one has to decide a suitable value for the insensitivity parameter ε. Smola et al. considered its “optimal” choice by studying the statistical efficiency in a location parameter estimation problem. While they successfully predicted a linear scaling between the optimal ε and the noise in the data, their theoretically optimal value does not have a close match with its experimentally observed counterpart in the case of Gaussian noise. In this paper, we attempt to better explain their experimental results by studying the regression problem itself. Our resultant predicted choice of ε is much closer to the experimentally observed optimal value, while again demonstrating a linear trend with the input noise.