Results 1  10
of
24
Greedy layerwise training of deep networks
 IN NIPS
, 2007
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allow ..."
Abstract

Cited by 384 (48 self)
 Add to MetaCart
(Show Context)
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allowing them to compactly represent highly nonlinear and highlyvarying functions. However, until recently it was not clear how to train such deep networks, since gradientbased optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layerwise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layerwise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are highlevel abstractions of the input, bringing better generalization.
On Contrastive Divergence Learning
"... Maximumlikelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov ..."
Abstract

Cited by 128 (15 self)
 Add to MetaCart
Maximumlikelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a di#erent function called "contrastive divergence" (CD). CD learning has been successfully applied to various types of random fields. Here, we study the properties of CD learning and show that it provides biased estimates in general, but that the bias is typically very small. Fast CD learning can therefore be used to get close to an ML solution and slow ML learning can then be used to finetune the CD solution.
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 88 (12 self)
 Add to MetaCart
Département d’informatique et de recherche opérationnelle
Stacks of Convolutional Restricted Boltzmann Machines for ShiftInvariant Feature Learning
"... In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (CRBM), a ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (CRBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate CRBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the stateoftheart on handwritten digit recognition and pedestrian detection. 1.
Implementing neural models in silicon
, 2004
"... Neural models are used in both computational neuroscience and in pattern recognition. The aim of the first is understanding of real neural systems, and of the second is gaining better, possibly brainlike performance for systems being built. In both cases, the highly parallel nature of the neural sy ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Neural models are used in both computational neuroscience and in pattern recognition. The aim of the first is understanding of real neural systems, and of the second is gaining better, possibly brainlike performance for systems being built. In both cases, the highly parallel nature of the neural system contrasts with the sequential nature of computer systems, resulting in slow and complex simulation software. More direct implementation in hardware (whether digital or analogue) holds out the promise of faster emulation both because hardware implementation is inherently faster than software, and because the operation is much more parallel. There are costs to this: modifying the system (for example to test out variants of the system) is much harder when a full application specific integrated circuit has been built. Fast emulation can permit direct incorporation of a neural model into a system, permitting realtime input and output. Appropriate selection of implementation technology can help to make interfacing the system to external devices simpler. We review the technologies involved, and discuss some example systems. 1 Why implement neural models in silicon? There are two primary reasons for implementing neural models: one is to attempt to gain better, and possibly
Learning Fingerprint Orientation Fields Using Continuous Restricted Boltzmann Machines
"... Abstract—We aim to learn local orientation field patterns in fingerprints and correct distorted field patterns in noisy fingerprint images. This is formulated as a learning problem and achieved using two continuous restricted Boltzmann machines. The learnt orientation fields are then used in conjun ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We aim to learn local orientation field patterns in fingerprints and correct distorted field patterns in noisy fingerprint images. This is formulated as a learning problem and achieved using two continuous restricted Boltzmann machines. The learnt orientation fields are then used in conjunction with traditional Gabor based algorithms for fingerprint enhancement. Orientation fields extracted by gradientbased methods are local, and do not consider neighboring orientations. If some amount of noise is present in a fingerprint, then these methods perform poorly when enhancing the image, affecting fingerprint matching. This paper presents a method to correct the resulting noisy regions over patches of the fingerprint by training two continuous restricted Boltzmann machines. The continuous RBMs are trained with clean fingerprint images and applied to overlapping patches of the input fingerprint. Experimental results show that one can successfully restore patches of noisy fingerprint images.
Extracting Propositional Rules from Feedforward Neural Networks by Means of Binary Decision Diagrams
"... Symbol emergence in design ..."
The prototypical Restricted Boltzmann Machine
"... We introduce the spike and slab Restricted Boltzmann Machine, characterized by having both a realvalued vector, the slab, and a binary variable, the spike, associated with each unit in the hidden layer. The model possesses some practical properties such as being amenable to Block Gibbs sampling as ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce the spike and slab Restricted Boltzmann Machine, characterized by having both a realvalued vector, the slab, and a binary variable, the spike, associated with each unit in the hidden layer. The model possesses some practical properties such as being amenable to Block Gibbs sampling as well as being capable of generating similar latent representations of the data to the recently introduced mean and covariance Restricted Boltzmann Machine. We illustrate how the spike and slab Restricted Boltzmann Machine achieves competitive performance on the CIFAR10 object recognition task.
Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence A NeuralSymbolic Cognitive Agent for Online Learning and Reasoning
"... In realworld applications, the effective integration of learning and reasoning in a cognitive agent model is a difficult task. However, such integration may lead to a better understanding, use and construction of more realistic models. Unfortunately, existing models are either oversimplified or req ..."
Abstract
 Add to MetaCart
In realworld applications, the effective integration of learning and reasoning in a cognitive agent model is a difficult task. However, such integration may lead to a better understanding, use and construction of more realistic models. Unfortunately, existing models are either oversimplified or require much processing time, which is unsuitable for online learning and reasoning. Currently, controlled environments like training simulators do not effectively integrate learning and reasoning. In particular, higherorder concepts and cognitive abilities have many unknown temporal relations with the data, making it impossible to represent such relationships by hand. We introduce a novel cognitive agent model and architecture for online learning and reasoning that seeks to effectively represent, learn and reason in complex training environments. The agent architecture of the model combines neural learning with symbolic knowledge representation. It is capable of learning new hypotheses from observed data, and infer new beliefs based on these hypotheses. Furthermore, it deals with uncertainty and errors in the data using a Bayesian inference model. The validation of the model on realtime simulations and the results presented here indicate the promise of the approach when performing online learning and reasoning in realworld scenarios, with possible applications in a range of areas.
Universite de Montreal
"... Learning to recognize or predict sequences using longterm context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of th ..."
Abstract
 Add to MetaCart
Learning to recognize or predict sequences using longterm context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation. 1