Results 1 - 10
of
15
Learning polynomials with queries: The highly noisy case
, 1995
"... Given a function f mapping n-variate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of ag-fieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of alln-variate degreedpolynomials which agree withf ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Given a function f mapping n-variate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of ag-fieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of alln-variate degreedpolynomials which agree withfon a the natural phenomena underlying the input/output relationship tiny but non-negligible fraction, , of the input space. We give a of the function, and the goal of the learner is to come up with a randomized algorithm for solving this task which accessesfas a simple explanation which best fits the examples. Therefore the black box and runs in time polynomial in1;nand exponential in best explanation may account for only part of the phenomena. d, provided is(pd=jFj). For the special case whend=1, In some situations, when the phenomena appears very irregular, we solve this problem for jFj>0. In this case the providing an explanation which fits only part of it is better than nothing. Interestingly, Kearns et. al. did not consider the use of running time of our algorithm is bounded by a polynomial queries (but rather examples drawn from an arbitrary distribu-and exponential ind. Our algorithm generalizes a previously tion) as they were skeptical that queries could be of any help. known algorithm, due to Goldreich and Levin, that solves this We show that queries do seem to help (see below). task for the case whenF=GF(2)(andd=1).
Theory and Applications of Agnostic PAC-Learning with Small Decision Trees
, 1995
"... We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "real-world" datasets, and show that for mo ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "real-world" datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAC-learning algorithm is shown to be applicable to "real-world" classification problems. Since one can prove that T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to optimal 2-level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for w...
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
- Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract
-
Cited by 59 (12 self)
- Add to MetaCart
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multi-layer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
Efficient Agnostic Learning of Neural Networks with Bounded Fan-in
, 1996
"... We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to ..."
Abstract
-
Cited by 57 (18 self)
- Add to MetaCart
We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning real-valued functions with bounded noise, learning probabilistic concepts and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have real-valued inputs and outputs, an unlimited number of threshold hidden units with bounded fan-in, and a bound on the sum of the absolute values of the output weights. The number of computation This work was supported by the Australian Research Council and the Australian Telecommunications and Electronics Research Board. The material in this paper was pres...
On the Complexity of Computing and Learning with Multiplicative Neural Networks
- NEURAL COMPUTATION
"... In a great variety of neuron models neural inputs are combined using the summing operation. We introduce the concept of multiplicative neural networks that contain units which multiply their inputs instead of summing them and, thus, allow inputs to interact nonlinearly. The class of multiplicative n ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
In a great variety of neuron models neural inputs are combined using the summing operation. We introduce the concept of multiplicative neural networks that contain units which multiply their inputs instead of summing them and, thus, allow inputs to interact nonlinearly. The class of multiplicative neural networks comprises such widely known and well studied network types as higher-order networks and product unit networks. We investigate the complexity of computing and learning for multiplicative neural networks. In particular, we derive upper and lower bounds on the Vapnik-Chervonenkis (VC) dimension and the pseudo dimension for various types of networks with multiplicative units. As the most general case, we consider feedforward networks consisting of product and sigmoidal units, showing that their pseudo dimension is bounded from above by a polynomial with the same order of magnitude as the currently best known bound for purely sigmoidal networks. Moreover, we show that this bound holds even in the case when the unit type, product or sigmoidal, may be learned. Crucial for these results are calculations of solution set components bounds for new network classes. As to lower bounds we construct product unit networks of fixed depth with superlinear VC dimension. For sigmoidal networks of higher order we establish polynomial bounds that, in contrast to previous results, do not involve any restriction of the network order. We further consider various classes of higher-order units, also known as sigma-pi units, that are characterized by connectivity constraints. In terms of these we derive some asymptotically tight bounds.
Perspectives of Current Research about the Complexity of Learning on Neural Nets
, 1994
"... This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets: -- Neural nets that learn from mistakes -- Bounds for the Vapnik-Chervonenkis dimension of neural net ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets: -- Neural nets that learn from mistakes -- Bounds for the Vapnik-Chervonenkis dimension of neural nets -- Agnostic PAC-learning of functions on neural nets. All relevant definitions are given in this paper, and no previous knowledge about computational learning theory or neural nets is required. We refer to [RSO] for further introductory material and survey papers about the complexity of learning on neural nets. Throughout this paper we consider the following rather general notion of a (feedforward) neural net.
Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants
, 1994
"... There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models, much-studied since the introduction of the basic PAC model ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models, much-studied since the introduction of the basic PAC model by Valiant in 1984, provide a probabilistic framework for the discussion of generalization and learning. CONTENTS 3 Contents 1 Introduction 4 2 The Basic PAC Model of Learning 5 3 VC-Dimension and Growth Function 8 4 VC-Dimension and Linear Dimension 10 5 A Useful Probability Theorem 12 6 PAC Learning and the VC-Dimension 16 7 VC-Dimension of Binary-Output Networks 19 7.1 Introduction 19 7.2 Linearly weighted neural networks 21 7.3 Linear threshold networks 22 7.4 Other activation functions 26 7.5 The effect of weight restrictions 29 8 Computational Complexity of Learning 30 9 Stochastic Concepts 36 10 Distribution-Specific Learning 39 11 Graph Dimension and Multiple-Output Nets 42 11.1 T...
On Efficient Agnostic Learning of Linear Combinations of Basis Functions
- In Proceedings of the Eighth Annual Conference on Computational Learning Theory
, 1995
"... We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostica ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostically learnable if and only if the class of basis functions is efficiently agnostically learnable. We also show that the sample complexity for learning the linear combinations grows polynomially if and only if a combinatorial property of the class of basis functions, called the fat-shattering function, grows at most polynomially. We also relate the problem to agnostic learning of f0; 1g-valued function classes by showing that if a class of f0; 1g-valued functions is efficiently agnostically learnable (using the same function class) with the discrete loss function, then the class of linear combinations of functions from the class is efficiently agnostically learnable with the quadratic loss fun...
On the complexity of learning on feedforward neural nets
- in Proc. EATCS Advanced School on Computational Learning and Cryptography, Vietri sul Mare
, 1993
"... This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets:-- Neural nets that learn from mistakes-- Bounds for the Vapnik-Chervonenkis dimension of neural nets- ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets:-- Neural nets that learn from mistakes-- Bounds for the Vapnik-Chervonenkis dimension of neural nets-- Agnostic PAC-learning of functions on neural nets. All relevant definitions are given in this paper, and no previous knowledge about computational learning theory or neural nets is required. We refer to [RSO] for further introductory material and survey papers about the complexity of learning on neural nets. Throughout this paper we consider the following rather general notion of a (feedforward) neural net. Definition 1.1 A network architecture (or "neural net") N is a labeled acyclic directed graph. Its nodes of fan-in 0 ( " input nodes"), as well as its nodes of fan-out 0 ( " output nodes") are labeled by natural numbers. A node g in N with fan-in r? 0 is called a computation node (or gate), and it is labeled by some activation function fl g: R! R, some polynomial Q g (y 1; : : : ; y r), and a subset P g of the coefficients of this polynomial (if P g is not separately specified we assume that P g consists of all coefficients of Q g). One says that N is of order v if all polynomials Q g in N are of degree v. The coefficients in the sets P g for the gates g in N are called the programmable parameters of N. Assume that N has w programmable parameters, that some numbering of these has been fixed, and that values for all non-programmable parameters have been assigned. Furthermore assume that N has d input nodes and l output nodes. Then each assignment ff 2 R w of reals to the programmable parameters in N defines an analog circuit N ff, which computes a function x 7! N ff
Lower Bounds on the VC-Dimension of Smoothly Parametrized Function Classes
- Neural Computaion
, 1995
"... We examine the relationship between the VCdimension and the number of parameters of a smoothly parametrized function class. We show that the VC-dimension of such a function class is at least k if there exists a k-dimensional differentiable manifold in the parameter space such that each member ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We examine the relationship between the VCdimension and the number of parameters of a smoothly parametrized function class. We show that the VC-dimension of such a function class is at least k if there exists a k-dimensional differentiable manifold in the parameter space such that each member of the manifold corresponds to a different decision boundary. Using this result, we are able to obtain lower bounds on the VC-dimension proportional to the number of parameters for several function classes including two-layer neural networks with certain smooth activation functions and radial basis functions with a gaussian basis. These lower bounds hold even if the magnitudes of the parameters are restricted to be arbitrarily small. In Valiant's probably approximately correct learning framework, this implies that the number of examples necessary for learning these function classes is at least linear in the number of parameters. 1 INTRODUCTION Smoothly parametrized functions ar...

