Results 1  10
of
69
A fast learning algorithm for deep belief nets
 Neural Computation
, 2006
"... We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a ..."
Abstract

Cited by 970 (49 self)
 Add to MetaCart
We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that finetunes the weights using a contrastive version of the wakesleep algorithm. After finetuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The lowdimensional manifolds on which the digits lie are modelled by long ravines in the freeenergy landscape of the toplevel associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1
Efficient learning of sparse representations with an energybased model
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2006
, 2006
"... We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying nonlinearity that turns a code vector into a quasibinary sparse code vector. Given an input, the optimal code minimizes the distance b ..."
Abstract

Cited by 219 (15 self)
 Add to MetaCart
(Show Context)
We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying nonlinearity that turns a code vector into a quasibinary sparse code vector. Given an input, the optimal code minimizes the distance between the output of the decoder and the input patch while being as similar as possible to the encoder output. Learning proceeds in a twophase EMlike fashion: (1) compute the minimumenergy code vector, (2) adjust the parameters of the encoder and decoder so as to decrease the energy. The model produces “stroke detectors ” when trained on handwritten numerals, and Gaborlike filters when trained on natural image patches. Inference and learning are very fast, requiring no preprocessing, and no expensive sampling. Using the proposed unsupervised method to initialize the first layer of a convolutional network, we achieved an error rate slightly lower than the best reported result on the MNIST dataset. Finally, an extension of the method is described to learn topographical filter maps. 1
Learning Deep Architectures for AI
"... Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AIlevel tasks), one may need deep architectures. Deep architectures are composed of multiple levels of nonlinear operations, such as i ..."
Abstract

Cited by 183 (30 self)
 Add to MetaCart
Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AIlevel tasks), one may need deep architectures. Deep architectures are composed of multiple levels of nonlinear operations, such as in neural nets with many hidden layers or in complicated propositional formulae reusing many subformulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the stateoftheart in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of singlelayer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
Learning a similarity metric discriminatively, with application to face verification
 In Proc. of Computer Vision and Pattern Recognition Conference
, 2005
"... ..."
(Show Context)
Y.: Sparse feature learning for deep belief networks
 In: Advances in Neural Information Processing Systems (NIPS 2007
, 2007
"... Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the ..."
Abstract

Cited by 130 (14 self)
 Add to MetaCart
(Show Context)
Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the tradeoff between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, highorder dependencies between the input observed variables can be captured. 1
On the Spatial Statistics of Optical Flow
 In ICCV
, 2005
"... We develop a method for learning the spatial statistics of optical flow fields from a novel training database. Training flow fields are constructed using range images of natural scenes and 3D camera motions recovered from handheld and carmounted video sequences. A detailed analysis of optical flow ..."
Abstract

Cited by 102 (8 self)
 Add to MetaCart
(Show Context)
We develop a method for learning the spatial statistics of optical flow fields from a novel training database. Training flow fields are constructed using range images of natural scenes and 3D camera motions recovered from handheld and carmounted video sequences. A detailed analysis of optical flow statistics in natural scenes is presented and machine learning methods are developed to learn a Markov random field model of optical flow. The prior probability of a flow field is formulated as a FieldofExperts model that captures the higher order spatial statistics in overlapping patches and is trained using contrastive divergence. This new optical flow prior is compared with previous robust priors and is incorporated into a recent, accurate algorithm for dense optical flow computation. Experiments with natural and synthetic sequences illustrate how the learned optical flow prior quantitatively improves flow accuracy and how it captures the rich spatial structure found in natural scene motion.
Topographic product models applied to natural scene statistics
 Neural Computation
, 2005
"... We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discus ..."
Abstract

Cited by 58 (7 self)
 Add to MetaCart
(Show Context)
We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discuss complete and overcomplete models, and provide algorithms for training these models from data. Using patches of natural scenes we demonstrate that our approach represents a viable alternative to “independent components analysis ” as an interpretive model of biological visual systems. Although the two approaches are similar in flavor there are also important differences, particularly when the representations are overcomplete. By constraining the interactions within our model we are also able to study the topographic organization of Gaborlike receptive fields that are learned by our model. Finally, we discuss the relation of our new approach to previous work — in particular Gaussian Scale Mixture models, and variants of independent components analysis. 1
A tutorial on energybased learning
 PREDICTING STRUCTURED DATA
, 2006
"... EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract

Cited by 57 (6 self)
 Add to MetaCart
EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graphtransformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of nonprobabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches.
Factored 3Way Restricted Boltzmann Machines For Modeling Natural Images
, 2010
"... Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The GaussianBinary RBMs th ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The GaussianBinary RBMs that have been used to model realvalued data are not a good way to model the covariance structure of natural images. We propose a factored 3way RBM that uses the states of its hidden units to represent abnormalities in the local covariance structure of an image. This provides a probabilistic framework for the widely used simple/complex cell architecture. Our model learns binary features that work very well for object recognition on the “tiny images” data set. Even better features are obtained by then using standard binary RBM’s to learn a deeper model.
A comment on contrastive divergence
 Proc. of NIPS
, 2004
"... This paper analyses the Contrastive Divergence algorithm for learning statistical parameters. We relate the algorithm to the stochastic approximation literature. This enables us to specify conditions under which the algorithm is guaranteed to converge to the optimal solution (with probability 1). Th ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
(Show Context)
This paper analyses the Contrastive Divergence algorithm for learning statistical parameters. We relate the algorithm to the stochastic approximation literature. This enables us to specify conditions under which the algorithm is guaranteed to converge to the optimal solution (with probability 1). This includes necessary and sufficient conditions for the solution to be unbiased. 1