Results 1  10
of
13
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vec ..."
Abstract

Cited by 958 (5 self)
 Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine' (RVM), a model of identical functional form to the popular and stateoftheart `support vector machine' (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while oering a number of additional advantages. These include the benets of probabilistic predictions, automatic estimation of `nuisance' parameters, and the facility to utilise arbitrary basis functions (e.g. non`Mercer' kernels).
The Relevance Vector Machine
, 2000
"... The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement ..."
Abstract

Cited by 288 (6 self)
 Add to MetaCart
The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a tradeoff parameter and the need to utilise `Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.
Fast Marginal Likelihood Maximisation for Sparse Bayesian Models
 Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics
, 2003
"... The 'sparse Bayesian' modelling approach, as exemplified by the 'relevance vector machine ', enables sparse classification and regression functions to be obtained by linearlyweighting a small nmnber of fixed basis functions from a large dictionary of potential candidates. S ..."
Abstract

Cited by 115 (0 self)
 Add to MetaCart
The 'sparse Bayesian' modelling approach, as exemplified by the 'relevance vector machine ', enables sparse classification and regression functions to be obtained by linearlyweighting a small nmnber of fixed basis functions from a large dictionary of potential candidates. Such a model conveys a nmnber of advantages over the related and very popular 'support vector machine', but the necessary 'training' procedure optimisation of the marginal likelihood function is typically much slower. We describe a new and highly accelerated algorithm which exploits recentlyelucidated properties of the marginal likelihood function to enable maximisation via a principled and efficient sequential addition and deletion of candidate basis functions.
CommitteeBased Sample Selection For Probabilistic Classifiers
 Journal of Artificial Intelligence Research
, 1999
"... In many realworld learning tasks it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
In many realworld learning tasks it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, and extends the committeebased paradigm to the context of probabilistic classification. We describe a family of empirical methods for committeebased sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to...
Moderating the Outputs of Support Vector Machine Classifiers
 IEEE Transactions on Neural Networks
, 1999
"...  In this paper, we extend the use of moderated outputs to the support vector machine (SVM) by making use of a relationship between SVM and the evidence framework. The moderated output is more in line with the Bayesian idea that the posterior weight distribution should be taken into account upon pre ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
(Show Context)
 In this paper, we extend the use of moderated outputs to the support vector machine (SVM) by making use of a relationship between SVM and the evidence framework. The moderated output is more in line with the Bayesian idea that the posterior weight distribution should be taken into account upon prediction, and it also alleviates the usual tendency of assigning overly high condence to the estimated class memberships of the test patterns. Moreover, the moderated output derived here can be taken as an approximation to the posterior class probability. Hence, meaningful rejection thresholds can be assigned and outputs from several networks can be directly compared. Experimental results on both articial and realworld data are also discussed. KeywordsSupport vector machine, Evidence framework, Moderated output, Bayesian I. Introduction I N recent years, there has been a lot of interest in studying the support vector machine (SVM) [1], [2], [3], [4], [5], [6], [7]. SVM is based on the i...
Bayesian Neural Networks and Density Networks
 Nuclear Instruments and Methods in Physics Research, A
, 1994
"... This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
(Show Context)
This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent variable model is dened that is capable of discovering the underlying dimensionality of a data set. A Bayesian learning algorithm for these networks is derived and demonstrated. 1 Introduction to the Bayesian view of learning A binary classier is a parameterized mapping from an input x to an output y 2 [0; 1]); when its parameters w are specied, the classier states the probability that an input x belongs to class t = 1, rather than the alternative t = 0. Consider a binary classier which models the probability as a sigmoid function of x: P (t = 1jx; w;H) = y(x; w;H) = 1 1 + e wx (1) This form of model is known to statisticians as a linear logistic model, and in the neural networks ...
The Relevance Vector Machine Technique for Channel Equalization Application
 IEEE Trans. Neural Networks
, 2001
"... The recently introduced relevance vector machine (RVM) technique is applied to communication channel equalization. It is demonstrated that the RVM equalizer can closely match the optimal performance of the Bayesian equalizer, with a much sparser kernel representation than that is achievable by the s ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
(Show Context)
The recently introduced relevance vector machine (RVM) technique is applied to communication channel equalization. It is demonstrated that the RVM equalizer can closely match the optimal performance of the Bayesian equalizer, with a much sparser kernel representation than that is achievable by the stateofart support vector machine (SVM) technique. Keywords Support vector machines, relevance vector machines, Bayesian classication, equalization. I.
Neural Networks: A Pattern Recognition Perspective
, 1996
"... Introduction Neural networks have been exploited in a wide variety of applications, the majority of which are concerned with pattern recognition in one form or another. However, it has become widely acknowledged that the effective solution of all but the simplest of such problems requires a princip ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Introduction Neural networks have been exploited in a wide variety of applications, the majority of which are concerned with pattern recognition in one form or another. However, it has become widely acknowledged that the effective solution of all but the simplest of such problems requires a principled treatment, in other words one based on a sound theoretical framework. From the perspective of pattern recognition, neural networks can be regarded as an extension of the many conventional techniques which have been developed over several decades. Lack of understanding of the basic principles of statistical pattern recognition lies at the heart of many of the common mistakes in the application of neural networks. In this chapter we aim to show that the `black box' stigma of neural networks is largely unjustified, and that there is actually considerable insight available into the way in which neural networks operate, and how to use them effectively. Some of the ke
Adaptive Near Minimum Error Rate Training for Neural Networks with Application to Multiuser Detection in CDMA Communication Systems
 IEEE Trans. Neural Networks
, 2002
"... Adaptive training of neural networks is typically done using some stochastic gradient algorithm that tries to minimize the mean square error (MSE). For many applications, such as channel equalization and codedivision multipleaccess (CDMA) multiuser detection, the goal is to minimize the error prob ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Adaptive training of neural networks is typically done using some stochastic gradient algorithm that tries to minimize the mean square error (MSE). For many applications, such as channel equalization and codedivision multipleaccess (CDMA) multiuser detection, the goal is to minimize the error probability. For these applications, adopting the MSE criterion may lead to a poor performance. A novel adaptive near minimum error rate algorithm called the least bit error rate (LBER) is developed for training neural networks for these kinds of applications. The proposed method is applied to multiuser detection in CDMA communication systems. Simulation results show that the LBER algorithm has a good convergence speed and a small radial basis function (RBF) network trained by this adaptive algorithm can closely match the performance of the optimal Bayesian multiuser detector. The results also con rm that training the neural network multiuser detector using the least mean square (LMS) algorithm, although converging well in the MSE, can produce a poor error rate performance.
Bayesian Classiers are Large Margin Hyperplanes in a Hilbert Space
 Machine Learning: Proceedings of the Fifteenth International Conference
, 1998
"... Bayesian algorithms for Neural Networks are known to produce classiers which are very resistent to overtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of pro ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Bayesian algorithms for Neural Networks are known to produce classiers which are very resistent to overtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classiers, whose coecients are given by Bayes theorem. One of the concepts used to deal with thresholded convex combinations is the `margin ' of the hyperplane with respect to the training sample, which is correlated to the predictive power of the hypothesis itself. We provide a novel theoretical analysis of such classi ers, based on DataDependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space. We then present experimental evidence that the predictions of our model are correct, i.e. that bayesian classifers really nd hypotheses which have large margin on the training examples. This not only explains the remarkable resistance to over tting exhibited by such classiers, but also colocates them in the same class of other systems, like Support Vector machines and Adaboost, which have a similar performance.