Results 1  10
of
137
Almost Optimal Lower Bounds for Small Depth Circuits
 RANDOMNESS AND COMPUTATION
, 1989
"... We give improved lower bounds for the size of small depth circuits computing several functions. In particular we prove almost optimal lower bounds for the size of parity circuits. Further we show that there are functions computable in polynomial size and depth k but requires exponential size when ..."
Abstract

Cited by 280 (8 self)
 Add to MetaCart
(Show Context)
We give improved lower bounds for the size of small depth circuits computing several functions. In particular we prove almost optimal lower bounds for the size of parity circuits. Further we show that there are functions computable in polynomial size and depth k but requires exponential size when the depth is restricted to k1. Our main lemma which is of independent interest states that by using a random restriction we can convert an AND of small ORs to an OR of small ANDs and conversely.
Learning Decision Trees using the Fourier Spectrum
, 1991
"... This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each ..."
Abstract

Cited by 212 (10 self)
 Add to MetaCart
This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables over GF (2)). This paper shows how to learn in polynomial time any function that can be approximated (in norm L 2 ) by a polynomially sparse function (i.e., a function with only polynomially many nonzero Fourier coefficients). The authors demonstrate that any function f whose L 1 norm (i.e., the sum of absolute value of the Fourier coefficients) is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Moreover, it is shown that the functions with polynomial L 1 norm can be learned deterministically. The algorithm can a...
On the Power of SmallDepth Threshold Circuits
, 1990
"... We investigate the power of threshold circuits of small depth. In particular we give functions which require exponential size unweigted threshold circuits of depth 3 when we restrict the bottom fanin. We also prove that there are mone tone functions fk which can be computed in depth k and linear s ..."
Abstract

Cited by 124 (2 self)
 Add to MetaCart
We investigate the power of threshold circuits of small depth. In particular we give functions which require exponential size unweigted threshold circuits of depth 3 when we restrict the bottom fanin. We also prove that there are mone tone functions fk which can be computed in depth k and linear size A, Vcircuits but require exponential size to compute by a depth k 1 monotone weighted threshold circuit.
Scaling learning algorithms towards AI
, 2007
"... One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Le ..."
Abstract

Cited by 121 (23 self)
 Add to MetaCart
One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to nonparametric learning, particularly kernel methods, are fundamentally limited in their ability to learn complex highdimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very inefficient in terms of required number of computational elements and examples. Second, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learning) and semisupervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lowerlevel features or concepts are progressively combined into more abstract and higherlevel representations. We argue that deep architectures have the potential to generalize in nonlocal ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence. 1 1
Lower Bounds for the Size of Circuits of Bounded Depth in Basis
, 1986
"... this paper, we consider circuits of bounded depth in the basis f; \Phig. ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
this paper, we consider circuits of bounded depth in the basis f; \Phig.
DynFO: A Parallel, Dynamic Complexity Class
 Journal of Computer and System Sciences
, 1994
"... Traditionally, computational complexity has considered only static problems. Classical Complexity Classes such as NC, P, and NP are defined in terms of the complexity of checking  upon presentation of an entire input  whether the input satisfies a certain property. For many applications of compu ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
Traditionally, computational complexity has considered only static problems. Classical Complexity Classes such as NC, P, and NP are defined in terms of the complexity of checking  upon presentation of an entire input  whether the input satisfies a certain property. For many applications of computers it is more appropriate to model the process as a dynamic one. There is a fairly large object being worked on over a period of time. The object is repeatedly modified by users and computations are performed. We develop a theory of Dynamic Complexity. We study the new complexity class, Dynamic FirstOrder Logic (DynFO). This is the set of properties that can be maintained and queried in firstorder logic, i.e. relational calculus, on a relational database. We show that many interesting properties are in DynFO including multiplication, graph connectivity, bipartiteness, and the computation of minimum spanning trees. Note that none of these problems is in static FO, and this f...
Unprovability of Lower Bounds on the Circuit Size in Certain Fragments of Bounded Arithmetic
 IN IZVESTIYA OF THE RUSSIAN ACADEMY OF SCIENCE, MATHEMATICS
, 1995
"... We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators secure against the attack by smal ..."
Abstract

Cited by 55 (6 self)
 Add to MetaCart
(Show Context)
We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators secure against the attack by small depth circuits, and for another system which is strong enough to prove exponential lower bounds for constantdepth circuits, this is shown without using any unproven hardness assumptions. These results can be also viewed as direct corollaries of interpolationlike theorems for certain “split versions” of classical systems of Bounded Arithmetic introduced in this paper.
Nonuniform ACC circuit lower bounds
, 2010
"... The class ACC consists of circuit families with constant depth over unbounded fanin AND, OR, NOT, and MODm gates, where m> 1 is an arbitrary constant. We prove: • NTIME[2 n] does not have nonuniform ACC circuits of polynomial size. The size lower bound can be slightly strengthened to quasipoly ..."
Abstract

Cited by 50 (8 self)
 Add to MetaCart
(Show Context)
The class ACC consists of circuit families with constant depth over unbounded fanin AND, OR, NOT, and MODm gates, where m> 1 is an arbitrary constant. We prove: • NTIME[2 n] does not have nonuniform ACC circuits of polynomial size. The size lower bound can be slightly strengthened to quasipolynomials and other less natural functions. • ENP, the class of languages recognized in 2O(n) time with an NP oracle, doesn’t have nonuniform ACC circuits of 2no(1) size. The lower bound gives an exponential sizedepth tradeoff: for every d there is a δ> 0 such that ENP doesn’t have depthd ACC circuits of size 2nδ. Previously, it was not known whether EXP NP had depth3 polynomial size circuits made out of only MOD6 gates. The highlevel strategy is to design faster algorithms for the circuit satisfiability problem over ACC circuits, then prove that such algorithms entail the above lower bounds. The algorithm combines known properties of ACC with fast rectangular matrix multiplication and dynamic programming, while the second step requires a subtle strengthening of the author’s prior work [STOC’10]. Supported by the Josef Raviv Memorial Fellowship.
An O(n^(log log n)) Learning Algorithm for DNF under the Uniform Distribution
 Journal of Computer and System Sciences
, 1998
"... We show that a DNF with terms of size at most d can be approximated by a function with at most d O(d log 1=") non zero Fourier coefficients such that the expected error squared, with respect to the uniform distribution, is at most ". This property is used to derive a learning algorithm for ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
We show that a DNF with terms of size at most d can be approximated by a function with at most d O(d log 1=") non zero Fourier coefficients such that the expected error squared, with respect to the uniform distribution, is at most ". This property is used to derive a learning algorithm for DNF, under the uniform distribution. The learning algorithm uses queries and learns, with respect to the uniform distribution, a DNF with terms of size at most d in time polynomial in n and d O(d log 1=") . The interesting implications are for the case when " is constant. In this case our algorithm learns a DNF with a polynomial number of terms in time n O(log log n) , and a DNF with terms of size at most O(log n= log log n) in polynomial time.
Representational power of restricted boltzmann machines and deep belief networks
, 2007
"... Deep Belief Networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton et al., along with a greedy layerwise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a Restricted Boltzmann Mac ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
(Show Context)
Deep Belief Networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton et al., along with a greedy layerwise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a Restricted Boltzmann Machine (RBM), used to represent one layer of the model. Restricted Boltzmann Machines are interesting because inference is easy in them, and because they have been successfully used as building blocks for training deeper models. We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions. We then study the question of whether DBNs with more layers are strictly more powerful in terms of representational power. This suggests a new and less greedy criterion for training RBMs within DBNs. 1