Results 1 - 10
of
27
Generalization Performance of Support Vector Machines and Other Pattern Classifiers
, 1998
"... this paper has been twofold. Firstly, we have stated the known results for high confidence bounds on the generalization error of SVMs in terms of the margin and number of support vectors. Secondly, we wanted to highlight that these results can only be obtained from a data-dependent analysis relying ..."
Abstract
-
Cited by 105 (16 self)
- Add to MetaCart
this paper has been twofold. Firstly, we have stated the known results for high confidence bounds on the generalization error of SVMs in terms of the margin and number of support vectors. Secondly, we wanted to highlight that these results can only be obtained from a data-dependent analysis relying as they do on using some measure to estimate how favourable the input distribution is in relation to the target function. This type of analysis is relatively novel [9], but we feel that its potential for motivating algorithms that are able to take advantage of collusions between distribution and target is far from being exhausted. Indeed, we believe that this is frequently an ingredient in successful learning systems which has been exploited by accident. By more careful analysis of this phenomenon it may well be possible to motivate key ingredients in the Support Vector arsenal, such as choice of kernel function, the bound used in the soft-margin approach and so on. We have also given examples to show that the style of analysis is not limited to SVMs but applies to many other learning machines including two of the most effective techniques, boosting and Bayesian methods. Acknowledgements John Shawe-Taylor was supported in part by the EPSRC research grant number GR/K70366. Peter Bartlett was supported by the Australian Research Council. Appendix: Proof of Theorem 1.6
Efficient Agnostic Learning of Neural Networks with Bounded Fan-in
, 1996
"... We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to ..."
Abstract
-
Cited by 57 (18 self)
- Add to MetaCart
We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning real-valued functions with bounded noise, learning probabilistic concepts and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have real-valued inputs and outputs, an unlimited number of threshold hidden units with bounded fan-in, and a bound on the sum of the absolute values of the output weights. The number of computation This work was supported by the Australian Research Council and the Australian Telecommunications and Electronics Research Board. The material in this paper was pres...
Genetic Programming Using a Minimum Description Length Principle
- Advances in Genetic Programming
, 1994
"... This paper introduces a Minimum Description Length (MDL) principle to define fitness functions in Genetic Programming (GP). In traditional (Koza-style) GP, the size of trees was usually controlled by user-defined parameters, such as the maximum number of nodes and maximum tree depth. Large tree s ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
This paper introduces a Minimum Description Length (MDL) principle to define fitness functions in Genetic Programming (GP). In traditional (Koza-style) GP, the size of trees was usually controlled by user-defined parameters, such as the maximum number of nodes and maximum tree depth. Large tree sizes meant that the time necessary to measure their fitnesses often dominated total processing time. To overcome this difficulty, we introduce a method for controlling tree growth, which uses an MDL principle. Initially we choose a "decision tree" representation for the GP chromosomes, and then show how an MDL principle can be used to define GP fitness functions. Thereafter we apply the MDL-based fitness functions to some practical problems. Using our implemented system "STROGANOFF", we show how MDL-based fitness functions can be applied successfully to problems of pattern recognitions. The results demonstrate that our approach is superior to usual neural networks in terms of general...
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
- in Proceedings of the 34th Annual Symposium on Foundations of Computer Science
, 1993
"... We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced by Kearns [12] to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Specifically, we derive simultaneous upper bounds with respect to 6 on the number of queries, O(log2:), the Vapnik-Chervonenkis dimension of the query space, O(1og log log +), and the inverse of the minimum tolerance, O(+ log 3). In addition, we show that these general upper bounds are nearly optimal by describing a class of learning problems for which we simultaneously lower bound the number of queries by R(1og f) and the inverse of the minimum tolerance by a(:). We further apply our boosting results in the SQ model to learning in the PAC model with classification noise. Since nearly all PAC learning algorithms can be cast in the SQ model, we can apply our boosting techniques to convert these PAC algorithms into highly efficient SQ algorithms. By simulating these efficient SQ algorithms in the PAC model with classification noise, we show that nearly all PAC algorithms can be converted into highly efficient PAC algorithms which *Author was supported by DARPA Contract N00014-87-K-825 and by NSF Grant CCR-89-14428. Author’s net address: jaaQtheory.lca.rit.edu +.Author was supported by an NDSEG Fellowship and
On the Complexity of Learning for a Spiking Neuron
, 1997
"... ) Wolfgang Maass and Michael Schmitt Abstract Spiking neurons are models for the computational units in biological neural systems where information is considered to be encoded mainly in the temporal patterns of their activity. They provide a way of analyzing neural computation that is not captu ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
) Wolfgang Maass and Michael Schmitt Abstract Spiking neurons are models for the computational units in biological neural systems where information is considered to be encoded mainly in the temporal patterns of their activity. They provide a way of analyzing neural computation that is not captured by the traditional neuron models such as sigmoidal and threshold gates (or "Perceptrons"). We introduce a simple model of a spiking neuron that, in addition to the weights that model the plasticity of synaptic strength, also has variable transmission delays between neurons as programmable parameters. For coding of input and output values two modes are taken into account: binary coding for the Boolean and analog coding for the real-valued domain. We investigate the complexity of learning for a single spiking neuron within the framework of PAC-learnability. With regard to sample complexity, we prove that the VC-dimension is \Theta(n log n) and, hence, strictly larger than that of a thresho...
The Iterative Learning of Phonological Constraints
- Computational Linguistics
, 1991
"... This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so f ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so forming a template. The combination of this measure with a search algorithm is a powerful learning method for finding the best constraint matching a template and fitting a corpus. This method may be applied iteratively, using the same template, to learn a number of different constraints. Five applications of an implementation show some of the successes of this learning method: from learning consonant cluster constraints to vowel harmony.
On Efficient Agnostic Learning of Linear Combinations of Basis Functions
- In Proceedings of the Eighth Annual Conference on Computational Learning Theory
, 1995
"... We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostica ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostically learnable if and only if the class of basis functions is efficiently agnostically learnable. We also show that the sample complexity for learning the linear combinations grows polynomially if and only if a combinatorial property of the class of basis functions, called the fat-shattering function, grows at most polynomially. We also relate the problem to agnostic learning of f0; 1g-valued function classes by showing that if a class of f0; 1g-valued functions is efficiently agnostically learnable (using the same function class) with the discrete loss function, then the class of linear combinations of functions from the class is efficiently agnostically learnable with the quadratic loss fun...
Learning by Canonical Smooth Estimation, Part II: Learning and Choice of Model Complexity
- IEEE Transactions on Automatic Control
"... In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypo ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypothesis," from a set of candidates that will accurately predict the labels of new sample points. The expected mismatch between a hypothesis' prediction and the actual label of a new sample point is called the hypothesis ' "generalization error." We compare the canonical learner with the traditional technique of finding hypotheses that minimize the relative frequency-based empirical error estimate. We show that, for a broad class of learning problems, the set of cases for which such empirical error minimization works is a proper subset of the cases for which the canonical learner works. We derive bounds to show that the number of samples required by these two methods is comparable. We also add...
Learning by Canonical Smooth Estimation, Part I: Simultaneous Estimation
- IEEE Transactions on Automatic Control
, 1996
"... This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fixed, u ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fixed, unknown probability distribution. Based on this empirical data, the learner must select, from a set of candidate functions, a particular function, or "hypothesis," that will accurately predict the labels of future sample points. The expected mismatch between a hypothesis' prediction and the label of a new sample point is called the hypothesis' "generalization error." Following the pioneering work of Vapnik and Chervonenkis, others have attacked this sort of learning problem by finding hypotheses that minimize the relative frequency-based empirical error estimate. We generalize this approach by examining the "simultaneous estimation" problem: When does some procedure exist for estimating the g...
Efficient Learning from Faulty Data
, 1995
"... Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of efficient learning algorithms in noise tolerant variants of Valiant's PAC learning model. By applying this framework, we also obtain many new results for specific learning problems in various settings with faulty data. The central tool used in this thesis is the specification of learning algorithms in Kearns' Statistical Query (SQ) learning model, in which statistics, as opposed to labelled examples, are requested by the learner. These SQ learning algorithms are then converted into PAC algorithms which tolerate various types of faulty data. We develop this framework in three major parts: 1. We design automatic compilations of SQ algorithms into PAC algorithms which tolerate various types of data errors. These results include improvements to Kearns' classification noise compilation, and the first such compilations for malicious errors, attribute noise and new classes of "hybrid " noise composed of multiple noise types. 2. We prove nearly tight bounds on the required complexity of SQ algorithms. The upper bounds are based on a constructive technique which allows one to achieve this complexity even when it is not initially achieved by a given SQ algorithm. 3. We define and employ an improved model of SQ learning which yields noise tolerant PAC algorithms that are more efficient than those derived from standard SQ algorithms. Together, these results provide a unified and intuitive framework for noise tolerant learning that allows the algorithm designer to achieve efficient, and often optimal, fault tolerant learning.

