Results 1 - 10
of
224
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract
-
Cited by 165 (22 self)
- Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical self-supervised learning that may relate to the function of bottom-up and top-down cortical processing pathways.
Greedy layer-wise training of deep networks
- In NIPS
, 2007
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allow ..."
Abstract
-
Cited by 105 (18 self)
- Add to MetaCart
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
"... There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical gene ..."
Abstract
-
Cited by 93 (14 self)
- Add to MetaCart
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images. 1.
Restricted Boltzmann machines for collaborative filtering
- In Machine Learning, Proceedings of the Twenty-fourth International Conference (ICML 2004). ACM
, 2007
"... Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. We present eff ..."
Abstract
-
Cited by 74 (11 self)
- Add to MetaCart
Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. We present efficient learning and inference procedures for this class of models and demonstrate that RBM’s can be successfully applied to the Netflix data set, containing over 100 million user/movie ratings. We also show that RBM’s slightly outperform carefully-tuned SVD models. When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6 % better than the score of Netflix’s own system. 1.
Efficient learning of sparse representations with an energy-based model
- ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2006
, 2006
"... We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying non-linearity that turns a code vector into a quasi-binary sparse code vector. Given an input, the optimal code minimizes the distance b ..."
Abstract
-
Cited by 71 (13 self)
- Add to MetaCart
We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying non-linearity that turns a code vector into a quasi-binary sparse code vector. Given an input, the optimal code minimizes the distance between the output of the decoder and the input patch while being as similar as possible to the encoder output. Learning proceeds in a two-phase EM-like fashion: (1) compute the minimum-energy code vector, (2) adjust the parameters of the encoder and decoder so as to decrease the energy. The model produces “stroke detectors ” when trained on handwritten numerals, and Gabor-like filters when trained on natural image patches. Inference and learning are very fast, requiring no preprocessing, and no expensive sampling. Using the proposed unsupervised method to initialize the first layer of a convolutional network, we achieved an error rate slightly lower than the best reported result on the MNIST dataset. Finally, an extension of the method is described to learn topographical filter maps. 1
Unsupervised learning of invariant feature hierarchies with application to object recognition.” CVPR, 2007. 1 Data Driven HMC Algorithm. DDHMC (motion-based proposals) 1: Initialize chain with τo 2: for i = 1 to nsamples do 3: // 1. Data-Driven: Get Propo
- Initialize the Acceptance, H(qo, po), and the Proposal, H ′ (qo, po ) Hamiltonians , τq) 14: po = DMotion(τ ′ i , τq) 15: qo = DF orm(τ ′ i , τq) 16: draw po ∼ N (0, 1) 17: // 2. Perturbation on H ′ using Leapfrog 18: for j=1 to l do 13: qo = DF orm(τ ′ i
"... We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a pointwise sigmoid non-linearity, and a feature-pooling layer that compute ..."
Abstract
-
Cited by 65 (11 self)
- Add to MetaCart
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a pointwise sigmoid non-linearity, and a feature-pooling layer that computes the max of each filter output within adjacent windows. A second level of larger and more invariant features is obtained by training the same algorithm on patches of features from the first level. Training a supervised classifier on these features yields 0.64 % error on MNIST, and 54 % average recognition rate on Caltech 101
Training restricted Boltzmann machines using approximations to the likelihood gradient
- Proceedings of the 25th international conference on Machine learning
, 2008
"... A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standa ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standard Contrastive Divergence and Pseudo-Likelihood algorithms on the tasks of modeling and classifying various types of data. The Persistent Contrastive Divergence algorithm outperforms the other algorithms, and is equally fast and simple.
Modeling human motion using binary latent variables
- Advances in Neural Information Processing Systems
, 2006
"... We propose a non-linear generative model for human motion data that uses an undirected model with binary latent variables and real-valued “visible ” variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at th ..."
Abstract
-
Cited by 51 (15 self)
- Add to MetaCart
We propose a non-linear generative model for human motion data that uses an undirected model with binary latent variables and real-valued “visible ” variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. Such an architecture makes on-line inference efficient and allows us to use a simple approximate learning procedure. After training, the model finds a single set of parameters that simultaneously capture several different kinds of motion. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture. Website:
An empirical evaluation of deep architectures on problems with many factors of variation
- In ICML
, 2007
"... Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only been evaluated on relatively simple problems such as digit recognition in a controlled environment, for which many machine ..."
Abstract
-
Cited by 45 (10 self)
- Add to MetaCart
Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only been evaluated on relatively simple problems such as digit recognition in a controlled environment, for which many machine learning algorithms already report reasonable results. Here, we present a series of experiments which indicate that these models show promise in solving harder learning problems that exhibit many factors of variation. These models are compared with well-established algorithms such as Support Vector Machines and single hidden-layer feed-forward neural networks. 1.

