Results 1  10
of
109
A fast learning algorithm for deep belief nets
 Neural Computation
, 2006
"... We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a ..."
Abstract

Cited by 774 (53 self)
 Add to MetaCart
We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that finetunes the weights using a contrastive version of the wakesleep algorithm. After finetuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The lowdimensional manifolds on which the digits lie are modelled by long ravines in the freeenergy landscape of the toplevel associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract

Cited by 218 (23 self)
 Add to MetaCart
(Show Context)
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical selfsupervised learning that may relate to the function of bottomup and topdown cortical processing pathways.
Restricted Boltzmann machines for collaborative filtering
 In Machine Learning, Proceedings of the Twentyfourth International Conference (ICML 2004). ACM
, 2007
"... Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of twolayer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. We present eff ..."
Abstract

Cited by 185 (13 self)
 Add to MetaCart
(Show Context)
Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of twolayer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. We present efficient learning and inference procedures for this class of models and demonstrate that RBM’s can be successfully applied to the Netflix data set, containing over 100 million user/movie ratings. We also show that RBM’s slightly outperform carefullytuned SVD models. When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6 % better than the score of Netflix’s own system. 1.
Acoustic modeling using deep belief networks
 IEEE Trans. Audio, Speech, Lang. Process
, 2012
"... Abstract—Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that co ..."
Abstract

Cited by 126 (17 self)
 Add to MetaCart
(Show Context)
Abstract—Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters. These networks are first pretrained as a multilayer generative model of a window of spectral feature vectors without making use of any discriminative information. Once the generative pretraining has designed the features, we perform discriminative finetuning using backpropagation to adjust the features slightly to make them better at predicting a probability distribution over the states of monophone hidden Markov models. Index Terms—Acoustic modeling, deep belief networks (DBNs), neural networks, phone recognition. I.
Training restricted Boltzmann machines using approximations to the likelihood gradient
 Proceedings of the 25th international conference on Machine learning
, 2008
"... A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standa ..."
Abstract

Cited by 123 (3 self)
 Add to MetaCart
(Show Context)
A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standard Contrastive Divergence and PseudoLikelihood algorithms on the tasks of modeling and classifying various types of data. The Persistent Contrastive Divergence algorithm outperforms the other algorithms, and is equally fast and simple.
Representation Learning: A Review and New Perspectives
, 2012
"... The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to ..."
Abstract

Cited by 98 (2 self)
 Add to MetaCart
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and joint training of deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep architectures. This motivates longerterm unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
Classification using discriminative restricted boltzmann machines
 In ICML ’08: Proceedings of the 25th international conference on Machine learning. ACM
, 2008
"... Recently, many applications for Restricted Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feedforward neural network classifiers, ..."
Abstract

Cited by 77 (11 self)
 Add to MetaCart
Recently, many applications for Restricted Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feedforward neural network classifiers, and are not considered as a standalone solution to classification problems. In this paper, we argue that RBMs provide a selfcontained framework for deriving competitive nonlinear classifiers. We present an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers. This approach is simple in that RBMs are used directly to build a classifier, rather than as a stepping stone. Finally, we demonstrate how discriminative RBMs can also be successfully employed in a semisupervised setting.
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 66 (8 self)
 Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle
On the Quantitative Analysis of Deep Belief Networks
"... Deep Belief Networks (DBN’s) are generative models that contain many layers of hidden variables. Efficient greedy algorithms for learning and approximate inference have allowed these models to be applied successfully in many application domains. The main building block of a DBN is a bipartite undire ..."
Abstract

Cited by 62 (17 self)
 Add to MetaCart
(Show Context)
Deep Belief Networks (DBN’s) are generative models that contain many layers of hidden variables. Efficient greedy algorithms for learning and approximate inference have allowed these models to be applied successfully in many application domains. The main building block of a DBN is a bipartite undirected graphical model called a restricted Boltzmann machine (RBM). Due to the presence of the partition function, model selection, complexity control, and exact maximum likelihood learning in RBM’s are intractable. We show that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and we present a novel AIS scheme for comparing RBM’s with different architectures. We further show how an AIS estimator, along with approximate inference, can be used to estimate a lower bound on the logprobability that a DBN model with multiple hidden layers assigns to the test data. This is, to our knowledge, the first step towards obtaining quantitative results that would allow us to directly assess the performance of Deep Belief Networks as generative models of data. 1.
Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style
"... The Conditional Restricted Boltzmann Machine (CRBM) is a recently proposed model for time series that has a rich, distributed hidden state and permits simple, exact inference. We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplica ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
(Show Context)
The Conditional Restricted Boltzmann Machine (CRBM) is a recently proposed model for time series that has a rich, distributed hidden state and permits simple, exact inference. We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplicative threeway interactions that allow the effective interaction weight between two units to be modulated by the dynamic state of a third unit. We factorize the threeway weight tensor implied by the multiplicative model, reducing the number of parameters from O(N 3) to O(N 2). The result is an efficient, compact model whose effectiveness we demonstrate by modeling human motion. Like the CRBM, our model can capture diverse styles of motion with a single set of parameters, and the threeway interactions greatly improve the model’s ability to blend motion styles or to transition smoothly between them. 1.