Results 1  10
of
186
Hardness vs. randomness
 Journal of Computer and System Sciences
, 1994
"... We present a simple new construction of a pseudorandom bit generator, based on the constant depth generators of [N]. It stretches a short string of truly random bits into a long string that looks random to any algorithm from a complexity class C (eg P, NC, PSPACE,...) using an arbitrary function tha ..."
Abstract

Cited by 282 (28 self)
 Add to MetaCart
We present a simple new construction of a pseudorandom bit generator, based on the constant depth generators of [N]. It stretches a short string of truly random bits into a long string that looks random to any algorithm from a complexity class C (eg P, NC, PSPACE,...) using an arbitrary function that is hard for C. This construction reveals an equivalence between the problem of proving lower bounds and the problem of generating good pseudorandom sequences. Our construction has many consequences. The most direct one is that efficient deterministic simulation of randomized algorithms is possible under much weaker assumptions than previously known. The efficiency ofthe simulations depends on the strength of the assumptions, and may achieve P =BPP. Webelieve that our results are very strong evidence that the gap between randomized and deterministic complexity is not large. Using the known lower bounds for constant depth circuits, our construction yields an unconditionally proven pseudorandom generator for constant depth circuits. As an application of this generator we characterize the power of NP with a random oracle. 1.
Almost Optimal Lower Bounds for Small Depth Circuits
 RANDOMNESS AND COMPUTATION
, 1989
"... We give improved lower bounds for the size of small depth circuits computing several functions. In particular we prove almost optimal lower bounds for the size of parity circuits. Further we show that there are functions computable in polynomial size and depth k but requires exponential size when ..."
Abstract

Cited by 235 (8 self)
 Add to MetaCart
We give improved lower bounds for the size of small depth circuits computing several functions. In particular we prove almost optimal lower bounds for the size of parity circuits. Further we show that there are functions computable in polynomial size and depth k but requires exponential size when the depth is restricted to k1. Our main lemma which is of independent interest states that by using a random restriction we can convert an AND of small ORs to an OR of small ANDs and conversely.
Greedy layerwise training of deep networks
 In NIPS
, 2007
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allow ..."
Abstract

Cited by 205 (34 self)
 Add to MetaCart
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allowing them to compactly represent highly nonlinear and highlyvarying functions. However, until recently it was not clear how to train such deep networks, since gradientbased optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layerwise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layerwise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are highlevel abstractions of the input, bringing better generalization.
Learning Decision Trees using the Fourier Spectrum
, 1991
"... This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each ..."
Abstract

Cited by 189 (10 self)
 Add to MetaCart
This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables over GF (2)). This paper shows how to learn in polynomial time any function that can be approximated (in norm L 2 ) by a polynomially sparse function (i.e., a function with only polynomially many nonzero Fourier coefficients). The authors demonstrate that any function f whose L 1 norm (i.e., the sum of absolute value of the Fourier coefficients) is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Moreover, it is shown that the functions with polynomial L 1 norm can be learned deterministically. The algorithm can a...
On the power of smalldepth threshold circuits
 Proceedings 31st Annual IEEE Symposium on Foundations of Computer Science
, 1990
"... Abstract. Weinvestigate the power of threshold circuits of small depth. In particular, we give functions that require exponential size unweighted threshold circuits of depth 3 when we restrict the bottom fanin. We also prove that there are monotone functions fk that can be computed in depth k and li ..."
Abstract

Cited by 104 (2 self)
 Add to MetaCart
Abstract. Weinvestigate the power of threshold circuits of small depth. In particular, we give functions that require exponential size unweighted threshold circuits of depth 3 when we restrict the bottom fanin. We also prove that there are monotone functions fk that can be computed in depth k and linear size ^ � _circuits but require exponential size to compute by a depth k; 1 monotone weighted threshold circuit. Key words. Circuit complexity, monotone circuits, threshold circuits, lower bounds Subject classi cations. 68Q15, 68Q99 1.
Scaling learning algorithms towards AI
, 2007
"... One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Le ..."
Abstract

Cited by 85 (20 self)
 Add to MetaCart
One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to nonparametric learning, particularly kernel methods, are fundamentally limited in their ability to learn complex highdimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very inefficient in terms of required number of computational elements and examples. Second, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learning) and semisupervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lowerlevel features or concepts are progressively combined into more abstract and higherlevel representations. We argue that deep architectures have the potential to generalize in nonlocal ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence. 1 1
Unprovability of Lower Bounds on the Circuit Size in Certain Fragments of Bounded Arithmetic
 IN IZVESTIYA OF THE RUSSIAN ACADEMY OF SCIENCE, MATHEMATICS
, 1995
"... We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators secure against the attack by smal ..."
Abstract

Cited by 54 (6 self)
 Add to MetaCart
We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators secure against the attack by small depth circuits, and for another system which is strong enough to prove exponential lower bounds for constantdepth circuits, this is shown without using any unproven hardness assumptions. These results can be also viewed as direct corollaries of interpolationlike theorems for certain “split versions” of classical systems of Bounded Arithmetic introduced in this paper.
Complexity Classes Defined By Counting Quantifiers
, 1991
"... We study the polynomial time counting hierarchy, a hierarchy of complexity classes related to the notion of counting. We investigate some of their structural properties, settling many open questions dealing with oracle characterizations, closure under boolean operations, and relations with other com ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
We study the polynomial time counting hierarchy, a hierarchy of complexity classes related to the notion of counting. We investigate some of their structural properties, settling many open questions dealing with oracle characterizations, closure under boolean operations, and relations with other complexity classes. We develop a new combinatorial technique to obtain relativized separations for some of the studied classes, which imply absolute separations for some logarithmic time bounded complexity classes.
An O(n^(log log n)) Learning Algorithm for DNF under the Uniform Distribution
 Journal of Computer and System Sciences
, 1998
"... We show that a DNF with terms of size at most d can be approximated by a function with at most d O(d log 1=") non zero Fourier coefficients such that the expected error squared, with respect to the uniform distribution, is at most ". This property is used to derive a learning algorithm for ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
We show that a DNF with terms of size at most d can be approximated by a function with at most d O(d log 1=") non zero Fourier coefficients such that the expected error squared, with respect to the uniform distribution, is at most ". This property is used to derive a learning algorithm for DNF, under the uniform distribution. The learning algorithm uses queries and learns, with respect to the uniform distribution, a DNF with terms of size at most d in time polynomial in n and d O(d log 1=") . The interesting implications are for the case when " is constant. In this case our algorithm learns a DNF with a polynomial number of terms in time n O(log log n) , and a DNF with terms of size at most O(log n= log log n) in polynomial time.
Bounded Arithmetic and Lower Bounds in Boolean Complexity
 Feasible Mathematics II
, 1993
"... We study the question of provability of lower bounds on the complexity of explicitly given Boolean functions in weak fragments of Peano Arithmetic. To that end, we analyze what is the right fragment capturing the kind of techniques existing in Boolean complexity at present. We give both formal and i ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
We study the question of provability of lower bounds on the complexity of explicitly given Boolean functions in weak fragments of Peano Arithmetic. To that end, we analyze what is the right fragment capturing the kind of techniques existing in Boolean complexity at present. We give both formal and informal arguments supporting the claim that a conceivable answer is V 1 (which, in view of RSUV isomorphism, is equivalent to S 2 ), although some major results about the complexity of Boolean functions can be proved in (presumably) weaker subsystems like U 1 . As a byproduct of this analysis, we give a more constructive version of the proof of Hastad Switching Lemma which probably is interesting in its own right.