Results 1  10
of
87
Majority Gates vs. General Weighted Threshold Gates
 Computational Complexity
, 1992
"... . In this paper we study small depth circuits that contain threshold gates (with or without weights) and parity gates. All circuits we consider are of polynomial size. We prove several results which complete the work on characterizing possible inclusions between many classes defined by small depth c ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
. In this paper we study small depth circuits that contain threshold gates (with or without weights) and parity gates. All circuits we consider are of polynomial size. We prove several results which complete the work on characterizing possible inclusions between many classes defined by small depth circuits. These results are the following: 1. A single threshold gate with weights cannot in general be replaced by a polynomial fanin unweighted threshold gate of parity gates. 2. On the other hand it can be replaced by a depth 2 unweighted threshold circuit of polynomial size. An extension of this construction is used to prove that whatever can be computed by a depth d polynomial size threshold circuit with weights can be computed by a depth d + 1 polynomial size unweighted threshold circuit, where d is an arbitrary fixed integer. 3. A polynomial fanin threshold gate (with weights) of parity gates cannot in general be replaced by a depth 2 unweighted threshold circuit of polynomial size...
Noise sensitivity of Boolean functions and applications to percolation, Inst. Hautes Études
, 1999
"... It is shown that a large class of events in a product probability space are highly sensitive to noise, in the sense that with high probability, the configuration with an arbitrary small percent of random errors gives almost no prediction whether the event occurs. On the other hand, weighted majority ..."
Abstract

Cited by 71 (15 self)
 Add to MetaCart
It is shown that a large class of events in a product probability space are highly sensitive to noise, in the sense that with high probability, the configuration with an arbitrary small percent of random errors gives almost no prediction whether the event occurs. On the other hand, weighted majority functions are shown to be noisestable. Several necessary and sufficient conditions for noise sensitivity and stability are given. Consider, for example, bond percolation on an n + 1 by n grid. A configuration is a function that assigns to every edge the value 0 or 1. Let ω be a random configuration, selected according to the uniform measure. A crossing is a path that joins the left and right sides of the rectangle, and consists entirely of edges e with ω(e) = 1. By duality, the probability for having a crossing is 1/2. Fix an ǫ ∈ (0,1). For each edge e, let ω ′ (e) = ω(e) with probability 1 − ǫ, and ω ′ (e) = 1 − ω(e)
Curriculum Learning
"... Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum learning”. In the context of recent research studying the difficulty of training in the presence of nonconvex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various setups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of nonconvex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of nonconvex functions). 1.
Why does unsupervised pretraining help deep learning?
, 2010
"... Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks ..."
Abstract

Cited by 45 (11 self)
 Add to MetaCart
Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pretraining phase. The main question investigated here is the following: why does unsupervised pretraining work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization (Bengio et al., 2007). Our results build on the work of Erhan et al. (2009b), showing that unsupervised pretraining appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pretraining effect.
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
Département d’informatique et de recherche opérationnelle
Simulating Threshold Circuits by Majority Circuits
 SIAM Journal on Computing
, 1994
"... We prove that a single threshold gate with arbitrary weights can be simulated by an explicit polynomialsize depth 2 majority circuit. In general we show that a depth d threshold circuit can be simulated uniformly by a majority circuit of depth d + 1. Goldmann, Hastad, and Razborov showed in [10 ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
We prove that a single threshold gate with arbitrary weights can be simulated by an explicit polynomialsize depth 2 majority circuit. In general we show that a depth d threshold circuit can be simulated uniformly by a majority circuit of depth d + 1. Goldmann, Hastad, and Razborov showed in [10] that a nonuniform simulation exists. Our construction answers two open questions posed in [10]: we give an explicit construction whereas [10] uses a randomized existence argument, and we show that such a simulation is possible even if the depth d grows with the number of variables n (the simulation in [10] gives polynomialsize circuits only when d is constant). 1 A preliminary version of this paper appeared in Proc. 25th ACM STOC (1993), pp. 551560. 2 Laboratory for Computer Science, MIT, Cambridge MA 02139, Email: migo@theory.lcs.mit.edu. This author 's work was done at Royal Institute of Technology in Stockholm, and while visiting the University of Bonn 3 Department of Com...
The BNSChung Criterion for MultiParty Communication Complexity
 Computational Complexity
, 2000
"... The "Number on the Forehead" model of multiparty communication complexity was first suggested by Chandra, Furst and Lipton. The best known lower bound, for an explicit function (in this model), is a lower bound of \Omega\Gamma n=2 k ), where n is the size of the input of each player, and k is the ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
The "Number on the Forehead" model of multiparty communication complexity was first suggested by Chandra, Furst and Lipton. The best known lower bound, for an explicit function (in this model), is a lower bound of \Omega\Gamma n=2 k ), where n is the size of the input of each player, and k is the number of players (first proved by Babai, Nisan and Szegedy). This lower bound has many applications in complexity theory. Proving a better lower bound, for an explicit function, is a major open problem. Based on the result of BNS, Chung gave a sufficient criterion for a function to have large multipartycommunication complexity (up to \Omega\Gamma n=2 k )). In this paper, we use some of the ideas of BNS, and Chung, together with some new ideas, resulting in a new (easier and more modular) proof for the results of BNS and Chung. This gives a simpler way to prove lower bounds for the multipartycommunicationcomplexity of a function. 1 MultiParty Communication Complexity Multiparty co...
Circuit Complexity before the Dawn of the New Millennium
, 1997
"... The 1980's saw rapid and exciting development of techniques for proving lower bounds in circuit complexity. This pace has slowed recently, and there has even been work indicating that quite different proof techniques must be employed to advance beyond the current frontier of circuit lower bounds. Al ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
The 1980's saw rapid and exciting development of techniques for proving lower bounds in circuit complexity. This pace has slowed recently, and there has even been work indicating that quite different proof techniques must be employed to advance beyond the current frontier of circuit lower bounds. Although this has engendered pessimism in some quarters, there have in fact been many positive developments in the past few years showing that significant progress is possible on many fronts. This paper is a (necessarily incomplete) survey of the state of circuit complexity as we await the dawn of the new millennium.
The Communication Complexity of Threshold Gates
 In Proceedings of “Combinatorics, Paul Erdos is Eighty
, 1994
"... We prove upper bounds on the randomized communication complexity of evaluating a threshold gate (with arbitrary weights). For linear threshold gates this is done in the usual 2 party communication model, and for degreed threshold gates this is done in the multiparty model. We then use these upp ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
We prove upper bounds on the randomized communication complexity of evaluating a threshold gate (with arbitrary weights). For linear threshold gates this is done in the usual 2 party communication model, and for degreed threshold gates this is done in the multiparty model. We then use these upper bounds together with known lower bounds for communication complexity in order to give very easy proofs for lower bounds in various models of computation involving threshold gates. This generalizes several known bounds and answers several open problems.
Lower bounds for approximations by low degree polynomials over Zm
, 2001
"... Abstract We use a Ramseytheoretic argument to obtain the firstlower bounds for approximations over Zm by nonlinearpolynomials: ffl A degree2 polynomial over Zm (m odd) mustdiffer from the parity function on at least a ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract We use a Ramseytheoretic argument to obtain the firstlower bounds for approximations over Zm by nonlinearpolynomials: ffl A degree2 polynomial over Zm (m odd) mustdiffer from the parity function on at least a