Results 1  10
of
34
Generalization Performance of Support Vector Machines and Other Pattern Classifiers
, 1998
"... this paper has been twofold. Firstly, we have stated the known results for high confidence bounds on the generalization error of SVMs in terms of the margin and number of support vectors. Secondly, we wanted to highlight that these results can only be obtained from a datadependent analysis relying ..."
Abstract

Cited by 120 (19 self)
 Add to MetaCart
this paper has been twofold. Firstly, we have stated the known results for high confidence bounds on the generalization error of SVMs in terms of the margin and number of support vectors. Secondly, we wanted to highlight that these results can only be obtained from a datadependent analysis relying as they do on using some measure to estimate how favourable the input distribution is in relation to the target function. This type of analysis is relatively novel [9], but we feel that its potential for motivating algorithms that are able to take advantage of collusions between distribution and target is far from being exhausted. Indeed, we believe that this is frequently an ingredient in successful learning systems which has been exploited by accident. By more careful analysis of this phenomenon it may well be possible to motivate key ingredients in the Support Vector arsenal, such as choice of kernel function, the bound used in the softmargin approach and so on. We have also given examples to show that the style of analysis is not limited to SVMs but applies to many other learning machines including two of the most effective techniques, boosting and Bayesian methods. Acknowledgements John ShaweTaylor was supported in part by the EPSRC research grant number GR/K70366. Peter Bartlett was supported by the Australian Research Council. Appendix: Proof of Theorem 1.6
Efficient Agnostic Learning of Neural Networks with Bounded Fanin
, 1996
"... We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning realvalued functions with bounded noise, learning probabilistic concepts and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have realvalued inputs and outputs, an unlimited number of threshold hidden units with bounded fanin, and a bound on the sum of the absolute values of the output weights. The number of computation This work was supported by the Australian Research Council and the Australian Telecommunications and Electronics Research Board. The material in this paper was pres...
Genetic Programming Using a Minimum Description Length Principle
 Advances in Genetic Programming
, 1994
"... This paper introduces a Minimum Description Length (MDL) principle to define fitness functions in Genetic Programming (GP). In traditional (Kozastyle) GP, the size of trees was usually controlled by userdefined parameters, such as the maximum number of nodes and maximum tree depth. Large tree s ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
This paper introduces a Minimum Description Length (MDL) principle to define fitness functions in Genetic Programming (GP). In traditional (Kozastyle) GP, the size of trees was usually controlled by userdefined parameters, such as the maximum number of nodes and maximum tree depth. Large tree sizes meant that the time necessary to measure their fitnesses often dominated total processing time. To overcome this difficulty, we introduce a method for controlling tree growth, which uses an MDL principle. Initially we choose a "decision tree" representation for the GP chromosomes, and then show how an MDL principle can be used to define GP fitness functions. Thereafter we apply the MDLbased fitness functions to some practical problems. Using our implemented system "STROGANOFF", we show how MDLbased fitness functions can be applied successfully to problems of pattern recognitions. The results demonstrate that our approach is superior to usual neural networks in terms of general...
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
 in Proceedings of the 34th Annual Symposium on Foundations of Computer Science
, 1993
"... We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced by Kearns [12] to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Specifically, we derive simultaneous upper bounds with respect to 6 on the number of queries, O(log2:), the VapnikChervonenkis dimension of the query space, O(1og log log +), and the inverse of the minimum tolerance, O(+ log 3). In addition, we show that these general upper bounds are nearly optimal by describing a class of learning problems for which we simultaneously lower bound the number of queries by R(1og f) and the inverse of the minimum tolerance by a(:). We further apply our boosting results in the SQ model to learning in the PAC model with classification noise. Since nearly all PAC learning algorithms can be cast in the SQ model, we can apply our boosting techniques to convert these PAC algorithms into highly efficient SQ algorithms. By simulating these efficient SQ algorithms in the PAC model with classification noise, we show that nearly all PAC algorithms can be converted into highly efficient PAC algorithms which *Author was supported by DARPA Contract N0001487K825 and by NSF Grant CCR8914428. Authorâ€™s net address: jaaQtheory.lca.rit.edu +.Author was supported by an NDSEG Fellowship and
On the Complexity of Learning for a Spiking Neuron
, 1997
"... ) Wolfgang Maass and Michael Schmitt Abstract Spiking neurons are models for the computational units in biological neural systems where information is considered to be encoded mainly in the temporal patterns of their activity. They provide a way of analyzing neural computation that is not captu ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
) Wolfgang Maass and Michael Schmitt Abstract Spiking neurons are models for the computational units in biological neural systems where information is considered to be encoded mainly in the temporal patterns of their activity. They provide a way of analyzing neural computation that is not captured by the traditional neuron models such as sigmoidal and threshold gates (or "Perceptrons"). We introduce a simple model of a spiking neuron that, in addition to the weights that model the plasticity of synaptic strength, also has variable transmission delays between neurons as programmable parameters. For coding of input and output values two modes are taken into account: binary coding for the Boolean and analog coding for the realvalued domain. We investigate the complexity of learning for a single spiking neuron within the framework of PAClearnability. With regard to sample complexity, we prove that the VCdimension is \Theta(n log n) and, hence, strictly larger than that of a thresho...
The Iterative Learning of Phonological Constraints
 Computational Linguistics
, 1991
"... This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so f ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so forming a template. The combination of this measure with a search algorithm is a powerful learning method for finding the best constraint matching a template and fitting a corpus. This method may be applied iteratively, using the same template, to learn a number of different constraints. Five applications of an implementation show some of the successes of this learning method: from learning consonant cluster constraints to vowel harmony.
On Efficient Agnostic Learning of Linear Combinations of Basis Functions
 In Proceedings of the Eighth Annual Conference on Computational Learning Theory
, 1995
"... We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostica ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostically learnable if and only if the class of basis functions is efficiently agnostically learnable. We also show that the sample complexity for learning the linear combinations grows polynomially if and only if a combinatorial property of the class of basis functions, called the fatshattering function, grows at most polynomially. We also relate the problem to agnostic learning of f0; 1gvalued function classes by showing that if a class of f0; 1gvalued functions is efficiently agnostically learnable (using the same function class) with the discrete loss function, then the class of linear combinations of functions from the class is efficiently agnostically learnable with the quadratic loss fun...
Learning by Canonical Smooth Estimation, Part II: Learning and Choice of Model Complexity
 IEEE Transactions on Automatic Control
"... In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypothesis," from a set of candidates that will accurately predict the labels of new sample points. The expected mismatch between a hypothesis' prediction and the actual label of a new sample point is called the hypothesis ' "generalization error." We compare the canonical learner with the traditional technique of finding hypotheses that minimize the relative frequencybased empirical error estimate. We show that, for a broad class of learning problems, the set of cases for which such empirical error minimization works is a proper subset of the cases for which the canonical learner works. We derive bounds to show that the number of samples required by these two methods is comparable. We also add...
Learning by Canonical Smooth Estimation, Part I: Simultaneous Estimation
 IEEE Transactions on Automatic Control
, 1996
"... This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fix ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fixed, unknown probability distribution. Based on this empirical data, the learner must select, from a set of candidate functions, a particular function, or "hypothesis," that will accurately predict the labels of future sample points. The expected mismatch between a hypothesis' prediction and the label of a new sample point is called the hypothesis' "generalization error." Following the pioneering work of Vapnik and Chervonenkis, others have attacked this sort of learning problem by finding hypotheses that minimize the relative frequencybased empirical error estimate. We generalize this approach by examining the "simultaneous estimation" problem: When does some procedure exist for estimating the g...
Some DiscriminantBased PAC Algorithms
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... A classical approach in multiclass pattern classification is the following. Estimate the probability distributions that generated the observations for each label class, and then label new instances by applying the Bayes classifier to the estimated distributions. That approach provides more useful ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
A classical approach in multiclass pattern classification is the following. Estimate the probability distributions that generated the observations for each label class, and then label new instances by applying the Bayes classifier to the estimated distributions. That approach provides more useful information than just a class label; it also provides estimates of the conditional distribution of class labels, in situations where there is class overlap. We would