Results 1  10
of
128
Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
 Machine Learning
, 1988
"... learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each ex ..."
Abstract

Cited by 680 (5 self)
 Add to MetaCart
learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linearthreshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space. 1.
Analog Computation via Neural Networks
 THEORETICAL COMPUTER SCIENCE
, 1994
"... We pursue a particular approach to analog computation, based on dynamical systems of the type used in neural networks research. Our systems have a fixed structure, invariant in time, corresponding to an unchanging number of "neurons". If allowed exponential time for computation, they turn out to ha ..."
Abstract

Cited by 87 (8 self)
 Add to MetaCart
We pursue a particular approach to analog computation, based on dynamical systems of the type used in neural networks research. Our systems have a fixed structure, invariant in time, corresponding to an unchanging number of "neurons". If allowed exponential time for computation, they turn out to have unbounded power. However, under polynomialtime constraints there are limits on their capabilities, though being more powerful than Turing Machines. (A similar but more restricted model was shown to be polynomialtime equivalent to classical digital computation in the previous work [20].) Moreover, there is a precise correspondence between nets and standard nonuniform circuits with equivalent resources, and as a consequence one has lower bound constraints on what they can compute. This relationship is perhaps surprising since our analog devices do not change in any manner with input size. We note that these networks are not likely to solve polynomially NPhard problems, as the equality ...
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
 Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract

Cited by 60 (12 self)
 Add to MetaCart
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multilayer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
A Capacitive ThresholdLogic Gate
, 1996
"... A dense and fast thresholdlogic gate with a very high fanin capacity is described. The gate performs sumofproduct and thresholding operations in an architecture comprising a polytopoly capacitor array and an inverter chain. The Boolean function performed by the gate is soft programmable. This i ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
A dense and fast thresholdlogic gate with a very high fanin capacity is described. The gate performs sumofproduct and thresholding operations in an architecture comprising a polytopoly capacitor array and an inverter chain. The Boolean function performed by the gate is soft programmable. This is accomplished by adjusting the threshold with a dc voltage. Essentially, the operation is dynamic and thus, requires periodic reset. However, the gate can evaluate multiple input vectors in between two successive reset phases because evaluation is nondestructive. Asynchronous operation is, therefore, possible. The paper presents an electrical analysis of the gate, identifies its limitations, and describes a test chip containing four different gates of fanin 30, 62, 127, and 255. Experimental results confirming proper functionality in all these gates are given, and applications in arithmetic and logic function blocks are described. I. INTRODUCTION T HRESHOLD logic (TL) originally emerged ...
On the Complexity of Training Neural Networks with Continuous Activation Functions
, 1993
"... We deal with computational issues of loading a fixedarchitecture neural network with a set of positive and negative examples. This is the first result on the hardness of loading networks which do not consist of the binarythreshold neurons, but rather utilize a particular continuous activation func ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We deal with computational issues of loading a fixedarchitecture neural network with a set of positive and negative examples. This is the first result on the hardness of loading networks which do not consist of the binarythreshold neurons, but rather utilize a particular continuous activation function, commonly used in the neural network literature. We observe that the loading problem is polynomialtime if the input dimension is constant. Otherwise, however, any possible learning algorithm based on particular fixed architectures faces severe computational barriers. Similar theorems have already been proved by Megiddo and by Blum and Rivest, to the case of binarythreshold networks only. Our theoretical results lend further justification to the use of incremental (architecturechanging) techniques for training networks rather than fixed architectures. Furthermore, they imply hardness of learnability in the probablyapproximatelycorrect sense as well.
Computational Complexity Of Neural Networks: A Survey
, 1994
"... . We survey some of the central results in the complexity theory of discrete neural networks, with pointers to the literature. Our main emphasis is on the computational power of various acyclic and cyclic network models, but we also discuss briefly the complexity aspects of synthesizing networks fr ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
. We survey some of the central results in the complexity theory of discrete neural networks, with pointers to the literature. Our main emphasis is on the computational power of various acyclic and cyclic network models, but we also discuss briefly the complexity aspects of synthesizing networks from examples of their behavior. CR Classification: F.1.1 [Computation by Abstract Devices]: Models of Computationneural networks, circuits; F.1.3 [Computation by Abstract Devices ]: Complexity Classescomplexity hierarchies Key words: Neural networks, computational complexity, threshold circuits, associative memory 1. Introduction The currently again very active field of computation by "neural" networks has opened up a wealth of fascinating research topics in the computational complexity analysis of the models considered. While much of the general appeal of the field stems not so much from new computational possibilities, but from the possibility of "learning", or synthesizing networks...
On PAC Learning using Winnow, Perceptron, and a PerceptronLike Algorithm
"... In this paper we analyze the PAC learning abilities of several simple iterative algorithms for learning linear threshold functions, obtaining both positive and negative results. We show that Littlestone’s Winnow algorithm is not an efficient PAC learning algorithm for the class of positive linear th ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
In this paper we analyze the PAC learning abilities of several simple iterative algorithms for learning linear threshold functions, obtaining both positive and negative results. We show that Littlestone’s Winnow algorithm is not an efficient PAC learning algorithm for the class of positive linear threshold functions. We also prove that the Perceptron algorithm cannot efficiently learn the unrestricted class of linear threshold functions even under the uniform distribution on boolean examples. However, we show that the Perceptron algorithm can efficiently PAC learn the class of nested functions (a concept class known to be hard for Perceptron under arbitrary distributions) under the uniform distribution on boolean examples. Finally, we give a very simple Perceptronlike algorithm for learning origincentered halfspaces under the uniform distribution on the unit sphere in R^n. Unlike the Perceptron algorithm, which cannot learn in the presence of classification noise, the new algorithm can learn in the presence of monotonic noise (a generalization of classification noise). The new algorithm is significantly faster than previous algorithms in both the classification and monotonic noise settings.
Every linear threshold function has a lowweight approximator
 In Proceedings of the 21st Conference on Computational Complexity (CCC
, 2006
"... Given any linear threshold function f on n Boolean variables, we construct a linear threshold function g which disagrees with f on at most an ɛ fraction of inputs and has integer weights each of magnitude at most √ n · 2 Õ(1/ɛ2). We show that the construction is optimal in terms of its dependence on ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Given any linear threshold function f on n Boolean variables, we construct a linear threshold function g which disagrees with f on at most an ɛ fraction of inputs and has integer weights each of magnitude at most √ n · 2 Õ(1/ɛ2). We show that the construction is optimal in terms of its dependence on n by proving a lower bound of Ω ( √ n) on the weights required to approximate a particular linear threshold function. We give two applications. The first is a deterministic algorithm for approximately counting the fraction of satisfying assignments to an instance of the zeroone knapsack problem to within an additive ±ɛ. The algorithm runs in time polynomial in n (but exponential in 1/ɛ 2). In our second application, we show that any linear threshold function f is specified to within error ɛ by estimates of its Chow parameters (degree 0 and 1 Fourier coefficients) which are accurate to within an additive ±1/(n · 2 Õ(1/ɛ2)). This is the first such accuracy bound which is inverse polynomial in n (previous work of Goldberg [12] gave a 1/quasipoly(n) bound), and gives the first polynomial bound (in terms of n) on the number of examples required for learning linear threshold functions in the “restricted focus of attention ” framework.
Bounded Independence Fools Halfspaces
 In Proc. 50th Annual Symposium on Foundations of Computer Science (FOCS), 2009
"... We show that any distribution on {−1, +1} n that is kwise independent fools any halfspace (a.k.a. linear threshold function) h: {−1, +1} n → {−1, +1}, i.e., any function of the form h(x) = sign ( ∑n i=1 wixi − θ) where the w1,..., wn, θ are arbitrary real numbers, with error ɛ for k = O(ɛ−2 log 2 ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
We show that any distribution on {−1, +1} n that is kwise independent fools any halfspace (a.k.a. linear threshold function) h: {−1, +1} n → {−1, +1}, i.e., any function of the form h(x) = sign ( ∑n i=1 wixi − θ) where the w1,..., wn, θ are arbitrary real numbers, with error ɛ for k = O(ɛ−2 log 2 (1/ɛ)). Our result is tight up to log(1/ɛ) factors. Using standard constructions of kwise independent distributions, we obtain the first explicit pseudorandom generators G: {−1, +1} s → {−1, +1} n that fool halfspaces. Specifically, we fool halfspaces with error ɛ and seed length s = k · log n = O(log n · ɛ−2 log 2 (1/ɛ)). Our approach combines classical tools from real approximation theory with structural results on halfspaces by Servedio (Comput. Complexity 2007).