Results 1  10
of
13
Learning polynomials with queries: The highly noisy case
, 1995
"... Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withf ..."
Abstract

Cited by 88 (18 self)
 Add to MetaCart
Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withfon a the natural phenomena underlying the input/output relationship tiny but nonnegligible fraction, , of the input space. We give a of the function, and the goal of the learner is to come up with a randomized algorithm for solving this task which accessesfas a simple explanation which best fits the examples. Therefore the black box and runs in time polynomial in1;nand exponential in best explanation may account for only part of the phenomena. d, provided is(pd=jFj). For the special case whend=1, In some situations, when the phenomena appears very irregular, we solve this problem for jFj>0. In this case the providing an explanation which fits only part of it is better than nothing. Interestingly, Kearns et. al. did not consider the use of running time of our algorithm is bounded by a polynomial queries (but rather examples drawn from an arbitrary distribuand exponential ind. Our algorithm generalizes a previously tion) as they were skeptical that queries could be of any help. known algorithm, due to Goldreich and Levin, that solves this We show that queries do seem to help (see below). task for the case whenF=GF(2)(andd=1).
Efficient Agnostic Learning of Neural Networks with Bounded Fanin
, 1996
"... We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning realvalued functions with bounded noise, learning probabilistic concepts and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have realvalued inputs and outputs, an unlimited number of threshold hidden units with bounded fanin, and a bound on the sum of the absolute values of the output weights. The number of computation This work was supported by the Australian Research Council and the Australian Telecommunications and Electronics Research Board. The material in this paper was pres...
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
 Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract

Cited by 60 (12 self)
 Add to MetaCart
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multilayer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
Hardness Results for Neural Network Approximation Problems
, 1999
"... Introduction Previous negative results for learning twolayer neural network classifiers show that it is difficult to find a network that correctly classifies all examples in a training set. However, for learning to a particular accuracy it is only necessary to approximately solve this problem, tha ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Introduction Previous negative results for learning twolayer neural network classifiers show that it is difficult to find a network that correctly classifies all examples in a training set. However, for learning to a particular accuracy it is only necessary to approximately solve this problem, that is, to find a network that correctly classifies most examples in a training set. In this paper, we show that this approximation problem is hard for several neural network classes. The hardness of PAC style learning is a very natural question that has been addressed from a variety of viewpoints. The strongest nonlearnability conclusions are those stating that no matter what type of algorithm a learner may use, as long as its computational resources are limited, it would not be able to predict a previously unseen label (with probability significantly better than that of a random guess). Such results have been derived by noticing that, in some precise sense, learning
On Efficient Agnostic Learning of Linear Combinations of Basis Functions
 In Proceedings of the Eighth Annual Conference on Computational Learning Theory
, 1995
"... We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostica ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostically learnable if and only if the class of basis functions is efficiently agnostically learnable. We also show that the sample complexity for learning the linear combinations grows polynomially if and only if a combinatorial property of the class of basis functions, called the fatshattering function, grows at most polynomially. We also relate the problem to agnostic learning of f0; 1gvalued function classes by showing that if a class of f0; 1gvalued functions is efficiently agnostically learnable (using the same function class) with the discrete loss function, then the class of linear combinations of functions from the class is efficiently agnostically learnable with the quadratic loss fun...
Sequential PAC Learning
 In Proceedigs of COLT95
, 1995
"... We consider the use of "online" stopping rules to reduce the number of training examples needed to paclearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a priori, the idea is instead to observe training examples oneatatime ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
We consider the use of "online" stopping rules to reduce the number of training examples needed to paclearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a priori, the idea is instead to observe training examples oneatatime and decide "online" whether to stop and return a hypothesis, or continue training. The primary benefit of this approach is that we can detect when a hypothesizer has actually "converged," and halt training before the standard fixedsamplesize bounds. This paper presents a series of such sequential learning procedures for: distributionfree paclearning, "mistakebounded to pac" conversion, and distributionspecific paclearning, respectively. We analyze the worst case expected training sample size of these procedures, and show that this is often smaller than existing fixed sample size bounds  while providing the exact same worst case pacguarantees. We also provide lower bounds that show these r...
Learning of Depth Two Neural Networks with Constant Fanin at the Hidden Nodes (Extended Abstract)
 In Proc. 9th Annu. Conf. on Comput. Learning Theory
, 1996
"... We present algorithms for learning depth two neural networks where the hidden nodes are threshold gates with constant fanin. The transfer function of the output node might be more general: we have results for the cases when the threshold function, the logistic function or the identity function is u ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We present algorithms for learning depth two neural networks where the hidden nodes are threshold gates with constant fanin. The transfer function of the output node might be more general: we have results for the cases when the threshold function, the logistic function or the identity function is used as the transfer function at the output node. We give batch and online learning algorithms for these classes of neural networks and prove bounds on the performance of our algorithms. The batch algorithms work for real valued inputs whereas the online algorithms assume that the inputs are discretized. The hypotheses of our algorithms are essentially also neural networks of depth two. However, their number of hidden nodes might be much larger than the number of hidden nodes of the neural network that has to be learned. Our algorithms can handle such a large number of hidden nodes since they rely on multiplicative weight updates at the output node, and the performance of these algorithms s...
Agnostic Learning and Single Hidden Layer Neural Networks
, 1996
"... This thesis is concerned with some theoretical aspects of supervised learning of realvalued functions. We study a formal model of learning called agnostic learning. The agnostic learning model assumes a joint probability distribution on the observations (inputs and outputs) and requires the learnin ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This thesis is concerned with some theoretical aspects of supervised learning of realvalued functions. We study a formal model of learning called agnostic learning. The agnostic learning model assumes a joint probability distribution on the observations (inputs and outputs) and requires the learning algorithm to produce an hypothesis with performance close to that of the best function within a specified class of functions. It is a very general model of learning which includes function learning, learning with additive noise and learning the best approximation in a class of functions as special cases. Within the agnostic learning model, we concentrate on learning functions which can be well approximated by single hidden layer neural networks. Artificial neural networks are often used as black box models for modelling phenomena for which very little prior knowledge is available. Agnostic learning is a natural model for such learning problems. The class of single hidden layer neural netwo...
unknown title
"... dale~cs. toronto. edn We consider the use of “online ” stopping rules to reduce the number of training examples needed to patlearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a przorz, the idea is instead to observe training examples ..."
Abstract
 Add to MetaCart
dale~cs. toronto. edn We consider the use of “online ” stopping rules to reduce the number of training examples needed to patlearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a przorz, the idea is instead to observe training examples oneatatime and decide “online ” whether to stop and return a hypothesis, or continue training. The primary benefit of this approach is that we can detect when a hypothesizer has actually ‘[converged, ” and halt training before the standard fixedsamplesize bounds. This paper presents a series of such sequential learning procedures for: distributionfree patlearning, “mist akebounded to pat ” conversion, and distributionspecific patlearning, respectively. We analyze the worst case expected training sample size of these procedures, and show that this is often smaller than existing fixed sample size bounds — while providing the exact same worst case patguarantees. We also provide lower bounds that show these reductions can at best involve constant (and possibly log) factors. However, empirical studies show that these sequential learning procedures actually use many times fewer training examples in practice.