Results 1  10
of
88
PEGASUS: A policy search method for large MDPs and POMDPs
 In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000
"... We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent&qu ..."
Abstract

Cited by 245 (8 self)
 Add to MetaCart
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent" POMDP in which all state transitions (given the current state and action) are deterministic. This reduces the general problem of policy search to one in which we need only consider POMDPs with deterministic transitions. We give a natural way of estimating the value of all policies in these transformed POMDPs. Policy search is then simply performed by searching for a policy with high estimated value. We also establish conditions under which our value estimates will be good, recovering theoretical results similar to those of Kearns, Mansour and Ng [7], but with "sample complexity" bounds that have only a polynomial rather than exponential dependence on the horizon time. Our method appl...
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract

Cited by 203 (16 self)
 Add to MetaCart
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
Networks of Spiking Neurons: The Third Generation of Neural Network Models
 Neural Networks
, 1997
"... The computational power of formal models for networks of spiking neurons is compared with that of other neural network models based on McCulloch Pitts neurons (i.e. threshold gates) respectively sigmoidal gates. In particular it is shown that networks of spiking neurons are computationally more powe ..."
Abstract

Cited by 180 (16 self)
 Add to MetaCart
(Show Context)
The computational power of formal models for networks of spiking neurons is compared with that of other neural network models based on McCulloch Pitts neurons (i.e. threshold gates) respectively sigmoidal gates. In particular it is shown that networks of spiking neurons are computationally more powerful than these other neural network models. A concrete biologically relevant function is exhibited which can be computed by a single spiking neuron (for biologically reasonable values of its parameters), but which requires hundreds of hidden units on a sigmoidal neural net. This article does not assume prior knowledge about spiking neurons, and it contains an extensive list of references to the currently available literature on computations in networks of spiking neurons and relevant results from neurobiology. 1 Definitions and Motivations If one classifies neural network models according to their computational units, one can distinguish three different generations. The first generation i...
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 76 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
 Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract

Cited by 62 (17 self)
 Add to MetaCart
(Show Context)
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multilayer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 62 (4 self)
 Add to MetaCart
(Show Context)
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... We introduce a new method for proving explicit upper bounds on the VC Dimension of general functional basis networks, and prove as an application, for the first time, that the VC Dimension of analog neural networks with the sigmoidal activation function oe(y) = 1=1+e \Gammay is bounded by a q ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
We introduce a new method for proving explicit upper bounds on the VC Dimension of general functional basis networks, and prove as an application, for the first time, that the VC Dimension of analog neural networks with the sigmoidal activation function oe(y) = 1=1+e \Gammay is bounded by a quadratic polynomial O((lm) 2 ) in both the number l of programmable parameters, and the number m of nodes. The proof method of this paper generalizes to much wider class of Pfaffian activation functions and formulas, and gives also for the first time polynomial bounds on their VC Dimension. We present also some other applications of our method.
Superpolynomial lower bounds for monotone span programs
, 1996
"... In this paper we obtain the first superpolynomial lower bounds for monotone span programs computing explicit functions. The best previous lower bound was Ω(n 5/2) by Beimel, Gál, Paterson [BGP]; our proof exploits a general combinatorial lower bound criterion from that paper. Our lower bounds are ba ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
In this paper we obtain the first superpolynomial lower bounds for monotone span programs computing explicit functions. The best previous lower bound was Ω(n 5/2) by Beimel, Gál, Paterson [BGP]; our proof exploits a general combinatorial lower bound criterion from that paper. Our lower bounds are based on an analysis of Paleytype bipartite graphs via Weil’s character sum estimates. We prove an n Ω(log n / log log n) lower bound for the size of monotone span programs for the clique problem. Our results give the first superpolynomial lower bounds for linear secret sharing schemes. We demonstrate the surprising power of monotone span programs by exhibiting a function computable in this model in linear size while requiring superpolynomial size monotone circuits and exponential size monotone formulae. We also show that the perfect matching function can be computed by polynomial size (nonmonotone) span programs over arbitrary fields.
Neural Networks with Quadratic VC Dimension
, 1996
"... This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a longstanding open question, namely whether the wellknown O(w log w) bound, known for hardthreshold nets, also held fo ..."
Abstract

Cited by 46 (6 self)
 Add to MetaCart
(Show Context)
This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a longstanding open question, namely whether the wellknown O(w log w) bound, known for hardthreshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.