Results 1  10
of
127
Multiparty Communication Complexity
, 1989
"... A given Boolean function has its input distributed among many parties. The aim is to determine which parties to tMk to and what information to exchange with each of them in order to evaluate the function while minimizing the total communication. This paper shows that it is possible to obtain the Boo ..."
Abstract

Cited by 764 (22 self)
 Add to MetaCart
A given Boolean function has its input distributed among many parties. The aim is to determine which parties to tMk to and what information to exchange with each of them in order to evaluate the function while minimizing the total communication. This paper shows that it is possible to obtain the Boolean answer deterministically with only a polynomial increase in communication with respect to the information lower bound given by the nondeterministic communication complexity of the function.
Learning Deep Architectures for AI
"... Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AIlevel tasks), one may need deep architectures. Deep architectures are composed of multiple levels of nonlinear operations, such as i ..."
Abstract

Cited by 182 (32 self)
 Add to MetaCart
Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AIlevel tasks), one may need deep architectures. Deep architectures are composed of multiple levels of nonlinear operations, such as in neural nets with many hidden layers or in complicated propositional formulae reusing many subformulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the stateoftheart in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of singlelayer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
Representation Learning: A Review and New Perspectives
, 2012
"... The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to ..."
Abstract

Cited by 152 (4 self)
 Add to MetaCart
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and joint training of deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep architectures. This motivates longerterm unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
Why does unsupervised pretraining help deep learning?
, 2010
"... Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks ..."
Abstract

Cited by 143 (20 self)
 Add to MetaCart
Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pretraining phase. The main question investigated here is the following: why does unsupervised pretraining work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization (Bengio et al., 2007). Our results build on the work of Erhan et al. (2009b), showing that unsupervised pretraining appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pretraining effect.
Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion
, 2010
"... ..."
Curriculum Learning
"... Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum ..."
Abstract

Cited by 118 (12 self)
 Add to MetaCart
Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum learning”. In the context of recent research studying the difficulty of training in the presence of nonconvex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various setups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of nonconvex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of nonconvex functions). 1.
Noise sensitivity of Boolean functions and applications to percolation
, 2008
"... It is shown that a large class of events in a product probability space are highly sensitive to noise, in the sense that with high probability, the configuration with an arbitrary small percent of random errors gives almost no prediction whether the event occurs. On the other hand, weighted majority ..."
Abstract

Cited by 102 (19 self)
 Add to MetaCart
(Show Context)
It is shown that a large class of events in a product probability space are highly sensitive to noise, in the sense that with high probability, the configuration with an arbitrary small percent of random errors gives almost no prediction whether the event occurs. On the other hand, weighted majority functions are shown to be noisestable. Several necessary and sufficient conditions for noise sensitivity and stability are given. Consider, for example, bond percolation on an n + 1 by n grid. A configuration is a function that assigns to every edge the value 0 or 1. Let ω be a random configuration, selected according to the uniform measure. A crossing is a path that joins the left and right sides of the rectangle, and consists entirely of edges e with ω(e) = 1. By duality, the probability for having a crossing is 1/2. Fix an ǫ ∈ (0,1). For each edge e, let ω ′ (e) = ω(e) with probability 1 − ǫ, and ω ′ (e) = 1 − ω(e)
Majority Gates vs. General Weighted Threshold Gates
 Computational Complexity
, 1992
"... . In this paper we study small depth circuits that contain threshold gates (with or without weights) and parity gates. All circuits we consider are of polynomial size. We prove several results which complete the work on characterizing possible inclusions between many classes defined by small depth c ..."
Abstract

Cited by 101 (6 self)
 Add to MetaCart
(Show Context)
. In this paper we study small depth circuits that contain threshold gates (with or without weights) and parity gates. All circuits we consider are of polynomial size. We prove several results which complete the work on characterizing possible inclusions between many classes defined by small depth circuits. These results are the following: 1. A single threshold gate with weights cannot in general be replaced by a polynomial fanin unweighted threshold gate of parity gates. 2. On the other hand it can be replaced by a depth 2 unweighted threshold circuit of polynomial size. An extension of this construction is used to prove that whatever can be computed by a depth d polynomial size threshold circuit with weights can be computed by a depth d + 1 polynomial size unweighted threshold circuit, where d is an arbitrary fixed integer. 3. A polynomial fanin threshold gate (with weights) of parity gates cannot in general be replaced by a depth 2 unweighted threshold circuit of polynomial size...
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 88 (12 self)
 Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle