Results 1  10
of
13
Greedy layerwise training of deep networks
 In NIPS
, 2007
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allow ..."
Abstract

Cited by 184 (32 self)
 Add to MetaCart
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allowing them to compactly represent highly nonlinear and highlyvarying functions. However, until recently it was not clear how to train such deep networks, since gradientbased optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layerwise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layerwise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are highlevel abstractions of the input, bringing better generalization.
On Contrastive Divergence Learning
"... Maximumlikelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov ..."
Abstract

Cited by 82 (15 self)
 Add to MetaCart
Maximumlikelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a di#erent function called "contrastive divergence" (CD). CD learning has been successfully applied to various types of random fields. Here, we study the properties of CD learning and show that it provides biased estimates in general, but that the bias is typically very small. Fast CD learning can therefore be used to get close to an ML solution and slow ML learning can then be used to finetune the CD solution.
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
Département d’informatique et de recherche opérationnelle
Stacks of Convolutional Restricted Boltzmann Machines for ShiftInvariant Feature Learning
"... In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (CRBM), a ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (CRBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate CRBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the stateoftheart on handwritten digit recognition and pedestrian detection. 1.
Probabilistic Computing with Future Deep Sub Micrometer Devices: A Modelling Approach
"... Abstract — An approach is described that investigates the potential of probabilistic “neural ” architectures for computation with Deep SubMicrometer (DSM) MOSFETs. Initially, noisy MOSFET models are based upon those for a 0.35µm MOS technology with an exaggerated 1/f characteristic. We explore the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract — An approach is described that investigates the potential of probabilistic “neural ” architectures for computation with Deep SubMicrometer (DSM) MOSFETs. Initially, noisy MOSFET models are based upon those for a 0.35µm MOS technology with an exaggerated 1/f characteristic. We explore the manifestation of the 1/f characteristic at the output of 2quadrant multiplier when the key nchannel MOSFETs are replaced by “noisy ” MOSFETs. The stochastic behavior of this noisy multiplier has been mapped on to a software (Matlab) model of a Continuous Restricted Boltzmann Machine (CRBM) – an analogueinput stochastic computing structure. Simulation of this DSM CRBM implementation shows little degradation from that of a “perfect ” CRBM. This paper thus introduces a methodology for a form of “technologydownstreaming ” and highlights the potential of probabilistic architectures for DSM computation. 1.
Implementing neural models in silicon
, 2004
"... Neural models are used in both computational neuroscience and in pattern recognition. The aim of the first is understanding of real neural systems, and of the second is gaining better, possibly brainlike performance for systems being built. In both cases, the highly parallel nature of the neural sy ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Neural models are used in both computational neuroscience and in pattern recognition. The aim of the first is understanding of real neural systems, and of the second is gaining better, possibly brainlike performance for systems being built. In both cases, the highly parallel nature of the neural system contrasts with the sequential nature of computer systems, resulting in slow and complex simulation software. More direct implementation in hardware (whether digital or analogue) holds out the promise of faster emulation both because hardware implementation is inherently faster than software, and because the operation is much more parallel. There are costs to this: modifying the system (for example to test out variants of the system) is much harder when a full application specific integrated circuit has been built. Fast emulation can permit direct incorporation of a neural model into a system, permitting realtime input and output. Appropriate selection of implementation technology can help to make interfacing the system to external devices simpler. We review the technologies involved, and discuss some example systems. 1 Why implement neural models in silicon? There are two primary reasons for implementing neural models: one is to attempt to gain better, and possibly
Extracting Propositional Rules from Feedforward Neural Networks by Means of Binary Decision Diagrams
"... Symbol emergence in design ..."
The prototypical Restricted Boltzmann Machine
"... We introduce the spike and slab Restricted Boltzmann Machine, characterized by having both a realvalued vector, the slab, and a binary variable, the spike, associated with each unit in the hidden layer. The model possesses some practical properties such as being amenable to Block Gibbs sampling as ..."
Abstract
 Add to MetaCart
We introduce the spike and slab Restricted Boltzmann Machine, characterized by having both a realvalued vector, the slab, and a binary variable, the spike, associated with each unit in the hidden layer. The model possesses some practical properties such as being amenable to Block Gibbs sampling as well as being capable of generating similar latent representations of the data to the recently introduced mean and covariance Restricted Boltzmann Machine. We illustrate how the spike and slab Restricted Boltzmann Machine achieves competitive performance on the CIFAR10 object recognition task.
Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence A NeuralSymbolic Cognitive Agent for Online Learning and Reasoning
"... In realworld applications, the effective integration of learning and reasoning in a cognitive agent model is a difficult task. However, such integration may lead to a better understanding, use and construction of more realistic models. Unfortunately, existing models are either oversimplified or req ..."
Abstract
 Add to MetaCart
In realworld applications, the effective integration of learning and reasoning in a cognitive agent model is a difficult task. However, such integration may lead to a better understanding, use and construction of more realistic models. Unfortunately, existing models are either oversimplified or require much processing time, which is unsuitable for online learning and reasoning. Currently, controlled environments like training simulators do not effectively integrate learning and reasoning. In particular, higherorder concepts and cognitive abilities have many unknown temporal relations with the data, making it impossible to represent such relationships by hand. We introduce a novel cognitive agent model and architecture for online learning and reasoning that seeks to effectively represent, learn and reason in complex training environments. The agent architecture of the model combines neural learning with symbolic knowledge representation. It is capable of learning new hypotheses from observed data, and infer new beliefs based on these hypotheses. Furthermore, it deals with uncertainty and errors in the data using a Bayesian inference model. The validation of the model on realtime simulations and the results presented here indicate the promise of the approach when performing online learning and reasoning in realworld scenarios, with possible applications in a range of areas.
A LogDomain Implementation of the Diffusion Network in Very Large Scale Integration
"... The Diffusion Network(DN) is a stochastic recurrent network which has been shown capable of modeling the distributions of continuousvalued, continuoustime paths. However, the dynamics of the DN are governed by stochastic differential equations, making the DN unfavourable for simulation in a digital ..."
Abstract
 Add to MetaCart
The Diffusion Network(DN) is a stochastic recurrent network which has been shown capable of modeling the distributions of continuousvalued, continuoustime paths. However, the dynamics of the DN are governed by stochastic differential equations, making the DN unfavourable for simulation in a digital computer. This paper presents the implementation of the DN in analogue Very Large Scale Integration, enabling the DN to be simulated in real time. Moreover, the logdomain representation is applied to the DN, allowing the supply voltage and thus the power consumption to be reduced without limiting the dynamic ranges for diffusion processes. A VLSI chip containing a DN with two stochastic units has been designed and fabricated. The design of component circuits will be described, so will the simulation of the full system be presented. The simulation results demonstrate that the DN in VLSI is able to regenerate various types of continuous paths in realtime. 1