Results 1  10
of
26
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 373 (48 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Query by Committee
, 1992
"... We propose an algorithm called query by committee, in which a committee of students is trained on the same data set. The next query is chosen according to the principle of maximal disagreement. The algorithm is studied for two toy models: the highlow game and perceptron learning of another perceptr ..."
Abstract

Cited by 318 (3 self)
 Add to MetaCart
We propose an algorithm called query by committee, in which a committee of students is trained on the same data set. The next query is chosen according to the principle of maximal disagreement. The algorithm is studied for two toy models: the highlow game and perceptron learning of another perceptron. As the number of queries goes to infinity, the committee algorithm yields asymptotically finite information gain. This leads to generalization error that decreases exponentially with the number of examples. This in marked contrast to learning from randomly chosen inputs, for which the information gain approaches zero and the generalization error decreases with a relatively slow inverse power law. We suggest that asymptotically finite information gain may be an important characteristic of good query algorithms.
Theory Refinement on Bayesian Networks
, 1991
"... Theory refinement is the task of updating a domain theory in the light of new cases, to be done automatically or with some expert assistance. The problem of theory refinement under uncertainty is reviewed here in the context of Bayesian statistics, a theory of belief revision. The problem is reduced ..."
Abstract

Cited by 184 (5 self)
 Add to MetaCart
Theory refinement is the task of updating a domain theory in the light of new cases, to be done automatically or with some expert assistance. The problem of theory refinement under uncertainty is reviewed here in the context of Bayesian statistics, a theory of belief revision. The problem is reduced to an incremental learning task as follows: the learning system is initially primed with a partial theory supplied by a domain expert, and thereafter maintains its own internal representation of alternative theories which is able to be interrogated by the domain expert and able to be incrementally refined from data. Algorithms for refinement of Bayesian networks are presented to illustrate what is meant by "partial theory", "alternative theory representation ", etc. The algorithms are an incremental variant of batch learning algorithms from the literature so can work well in batch and incremental mode. 1 Introduction Theory refinement is the task of updating a domain theory in the light of...
Online Bayes Point Machines
"... We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A rand ..."
Abstract

Cited by 69 (3 self)
 Add to MetaCart
We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A random subsample of the incoming data stream is used to ensure diversity in the perceptron solutions. We experimentally study the algorithm's performance on online and batch learning settings.
The Relationship between PAC, the Statistical Physics framework, the Bayesian framework, and the VC framework
"... This paper discusses the intimate relationships between the supervised learning frameworks mentioned in the title. In particular, it shows how all those frameworks can be viewed as particular instances of a single overarching formalism. In doing this many commonly misunderstood aspects of those fram ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
This paper discusses the intimate relationships between the supervised learning frameworks mentioned in the title. In particular, it shows how all those frameworks can be viewed as particular instances of a single overarching formalism. In doing this many commonly misunderstood aspects of those frameworks are explored. In addition the strengths and weaknesses of those frameworks are compared, and some novel frameworks are suggested (resulting, for example, in a "correction" to the familiar biasplusvariance formula).
Bayes Point Machines: Estimating the Bayes Point in Kernel Space
 IJCAI WORKSHOP SVMS
, 1999
"... From a Bayesian perspective Support Vector Machines choose the hypothesis corresponding to the largest possible hypersphere that can be inscribed in version space, i.e. in the space of all consistent hypotheses given a training set. Those boundaries of version space which are tangent to the hy ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
From a Bayesian perspective Support Vector Machines choose the hypothesis corresponding to the largest possible hypersphere that can be inscribed in version space, i.e. in the space of all consistent hypotheses given a training set. Those boundaries of version space which are tangent to the hypersphere define the support vectors. An alternative and potentially better approach is to construct the hypothesis using the whole of version space. This is achieved by using a Bayes Point Machine which finds the midpoint of the region of intersection of all hyperplanes bisecting version space into two halves of equal volume (the Bayes point). It is known that the center of mass of version space approximates the Bayes point [ Watkin, 1993 ] . We suggest estimating the center of mass by averaging over the trajectory of a billiard ball bouncing in version space. Experimental results are presented indicating that Bayes Point Machines consistently outperform Support Vector Machines.
Kernel Methods: A Survey of Current Techniques
 Neurocomputing
, 2000
"... : Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
: Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free parameters to adjust and the architecture of the learning machine does not need to be found by experimentation. In this tutorial we survey this subject with a principal focus on the most wellknown models based on kernel substitution, namely, Support Vector Machines. 1 Introduction. Support Vector Machines (SVMs) have been successfully applied to a number of applications ranging from particle identification, face identification and text categorisation to engine knock detection, bioinformatics and database marketing [9]. The approach is systematic and properly motivated by statistical learning theory [42]. Training involves optimisation of a convex cost function: there are no false local mi...
Playing Billiard in Version Space
, 1997
"... A raytracing method inspired by ergodic billiards is used to estimate the theoretically best decision rule for a given set of linear separable examples. For randomly distributed examples the billiard estimate of the single Perceptron with best average generalization probability agrees with know ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
A raytracing method inspired by ergodic billiards is used to estimate the theoretically best decision rule for a given set of linear separable examples. For randomly distributed examples the billiard estimate of the single Perceptron with best average generalization probability agrees with known analytic results, while for reallife classification problems the generalization probability is consistently enhanced when compared to the maximal stability Perceptron. 1 Introduction Neural networks can be used for both concept learning (classification) and for function interpolation and/or extrapolation. Two basic mathematical methods seem to be particularly adequate for studying neural networks: geometry (especially combinatorial geometry) and probability theory (statistical physics). Geometry is illuminating and probability theory is powerful. In this paper I consider the perhaps simplest neural network, the venerable Perceptron [1]: given a set of examples falling in two classes,...
Bayesian Learning in Reproducing Kernel Hilbert Spaces
 MACHINE LEARNING
, 1999
"... Support Vector Machines find the hypothesis that corresponds to the centre of the largest hypersphere that can be placed inside version space, i.e. the space of all consistent hypotheses given a training set. The boundaries of version space touched by this hypersphere define the support vectors. An ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
Support Vector Machines find the hypothesis that corresponds to the centre of the largest hypersphere that can be placed inside version space, i.e. the space of all consistent hypotheses given a training set. The boundaries of version space touched by this hypersphere define the support vectors. An even more promising approach is to construct the hypothesis using the whole of version space. This is achieved by the Bayes point: the midpoint of the region of intersection of all hyperplanes bisecting version space into two volumes of equal magnitude. It is known that the centre of mass of version space approximates the Bayes point [31]. The centre of mass is estimated by averaging over the trajectory of a billiard in version space. We derive bounds on the generalisation error of Bayesian classifiers in terms of the volume ratio of version space and parameter space. This ratio serves as an effective VC dimension and greatly influences generalisation. We present experimental results indicating that Bayes Point Machines consistently outperform Support Vector Machines. Moreover, we show theoretically and experimentally how Bayes Point Machines can easily be extended to admit training errors.
How Well do Bayes Methods Work for OnLine Prediction of {±1} values?
 In Proceedings of the Third NEC Symposium on Computation and Cognition. SIAM
, 1992
"... We look at sequential classification and regression problems in which f\Sigma1glabeled instances are given online, one at a time, and for each new instance, before seeing the label, the learning system must either predict the label, or estimate the probability that the label is +1. We look at the ..."
Abstract

Cited by 18 (11 self)
 Add to MetaCart
We look at sequential classification and regression problems in which f\Sigma1glabeled instances are given online, one at a time, and for each new instance, before seeing the label, the learning system must either predict the label, or estimate the probability that the label is +1. We look at the performance of Bayes method for this task, as measured by the total number of mistakes for the classification problem, and by the total log loss (or information gain) for the regression problem. Our results are given by comparing the performance of Bayes method to the performance of a hypothetical "omniscient scientist" who is able to use extra information about the labeling process that would not be available in the standard learning protocol. The results show that Bayes methods perform only slightly worse than the omniscient scientist in many cases. These results generalize previous results of Haussler, Kearns and Schapire, and Opper and Haussler. 1 Introduction Several recent papers in...