Results 1  10
of
51
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract

Cited by 194 (22 self)
 Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical selfsupervised learning that may relate to the function of bottomup and topdown cortical processing pathways.
Learning multiple layers of features from tiny images
, 2009
"... April 8, 2009Groups at MIT and NYU have collected a dataset of millions of tiny colour images from the web. It is, in principle, an excellent dataset for unsupervised training of deep generative models, but previous researchers who have tried this have found it di cult to learn a good set of lters f ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
April 8, 2009Groups at MIT and NYU have collected a dataset of millions of tiny colour images from the web. It is, in principle, an excellent dataset for unsupervised training of deep generative models, but previous researchers who have tried this have found it di cult to learn a good set of lters from the images. We show how to train a multilayer generative model that learns to extract meaningful features which resemble those found in the human visual cortex. Using a novel parallelization algorithm to distribute the work among multiple machines connected on a network, we show how training such a model can be done in reasonable time. A second problematic aspect of the tiny images dataset is that there are no reliable class labels which makes it hard to use for object recognition experiments. We created two sets of reliable labels. The CIFAR10 set has 6000 examples of each of 10 classes and the CIFAR100 set has 600 examples of each of 100 nonoverlapping classes. Using these labels, we show that object recognition is signi cantly
Modeling human motion using binary latent variables
 Advances in Neural Information Processing Systems
, 2006
"... We propose a nonlinear generative model for human motion data that uses an undirected model with binary latent variables and realvalued “visible ” variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at th ..."
Abstract

Cited by 90 (20 self)
 Add to MetaCart
We propose a nonlinear generative model for human motion data that uses an undirected model with binary latent variables and realvalued “visible ” variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at the last few timesteps. Such an architecture makes online inference efficient and allows us to use a simple approximate learning procedure. After training, the model finds a single set of parameters that simultaneously capture several different kinds of motion. We demonstrate the power of our approach by synthesizing various motion sequences and by performing online filling in of data lost during motion capture. Website:
Topographic product models applied to natural scene statistics
 Neural Computation
, 2005
"... We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discus ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discuss complete and overcomplete models, and provide algorithms for training these models from data. Using patches of natural scenes we demonstrate that our approach represents a viable alternative to “independent components analysis ” as an interpretive model of biological visual systems. Although the two approaches are similar in flavor there are also important differences, particularly when the representations are overcomplete. By constraining the interactions within our model we are also able to study the topographic organization of Gaborlike receptive fields that are learned by our model. Finally, we discuss the relation of our new approach to previous work — in particular Gaussian Scale Mixture models, and variants of independent components analysis. 1
Curriculum Learning
"... Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum learning”. In the context of recent research studying the difficulty of training in the presence of nonconvex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various setups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of nonconvex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of nonconvex functions). 1.
Classification using discriminative restricted boltzmann machines
 In ICML ’08: Proceedings of the 25th international conference on Machine learning. ACM
, 2008
"... Recently, many applications for Restricted Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feedforward neural network classifiers, ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
Recently, many applications for Restricted Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feedforward neural network classifiers, and are not considered as a standalone solution to classification problems. In this paper, we argue that RBMs provide a selfcontained framework for deriving competitive nonlinear classifiers. We present an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers. This approach is simple in that RBMs are used directly to build a classifier, rather than as a stepping stone. Finally, we demonstrate how discriminative RBMs can also be successfully employed in a semisupervised setting.
Modeling Pixel Means and Covariances Using Factorized ThirdOrder Boltzmann Machines
, 2010
"... Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods in which the hidden units determine the covariance matrix of a zeromean Gaussian distribution. In this work, we propose a probabilistic model that combines these two approaches into a single framework. We represent each image using one set of binary latent features that model the imagespecific covariance and a separate set that model the mean. We show that this approach provides a probabilistic framework for the widely used simplecell complexcell architecture, it produces very realistic samples of natural images and it extracts features that yield stateoftheart recognition accuracy on the challenging CIFAR 10 dataset.
Unsupervised Learning by Convex and Conic Coding
 Advances in Neural Information Processing Systems 9
, 1997
"... Unsupervised learning algorithms based on convex and conic encoders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develo ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
Unsupervised learning algorithms based on convex and conic encoders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms are used to model handwritten digits and compared with vector quantization and principal component analysis. The neural network implementations involve feedback connections that project a reconstruction back to the input layer. 1 Introduction Vector quantization (VQ) and principal component analysis (PCA) are two widely used unsupervised learning algorithms, based on two fundamentally different ways of encoding data. In VQ, the input is encoded as the index of the closest prototype stored in memory. In PCA, the input is encoded as the coefficients of a linear superposition of a set of basis ...
Recognizing handwritten digits using hierarchical products of experts
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... Abstract—The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a nonlinear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of di ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
Abstract—The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a nonlinear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different classspecific models. To improve discriminative performance, a hierarchy of separate models can be learned for each digit class. Each model in the hierarchy learns a layer of binary feature detectors that model the probability distribution of vectors of activity of feature detectors in the layer below. The models in the hierarchy are trained sequentially and each model uses a layer of binary feature detectors to learn a generative model of the patterns of feature activities in the preceding layer. After training, each layer of feature dectectors produces a separate, unnormalized log probabilty score. With three layers of feature detectors for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classification network that is trained on separate data. On the MNIST database, our system is comparable with current stateoftheart discriminative methods, demonstrating that the product of experts learning procedure can produce effective hierarchies of generative models of highdimensional data. Index Terms—Neural networks, products of experts, handwriting recognition, feature extraction, shape recognition, Boltzmann machines, modelbased recognition, generative models.
Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style
"... The Conditional Restricted Boltzmann Machine (CRBM) is a recently proposed model for time series that has a rich, distributed hidden state and permits simple, exact inference. We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplica ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
The Conditional Restricted Boltzmann Machine (CRBM) is a recently proposed model for time series that has a rich, distributed hidden state and permits simple, exact inference. We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplicative threeway interactions that allow the effective interaction weight between two units to be modulated by the dynamic state of a third unit. We factorize the threeway weight tensor implied by the multiplicative model, reducing the number of parameters from O(N 3) to O(N 2). The result is an efficient, compact model whose effectiveness we demonstrate by modeling human motion. Like the CRBM, our model can capture diverse styles of motion with a single set of parameters, and the threeway interactions greatly improve the model’s ability to blend motion styles or to transition smoothly between them. 1.