Results 11  20
of
150
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 90 (12 self)
 Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle
Deep Belief Networks for phone recognition
"... Hidden Markov Models (HMMs) have been the stateoftheart techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable ..."
Abstract

Cited by 89 (12 self)
 Add to MetaCart
(Show Context)
Hidden Markov Models (HMMs) have been the stateoftheart techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable of modeling the many types of variability present in the speech generation process. Deep Belief Networks (DBNs) have recently proved to be very effective for a variety of machine learning problems and this paper applies DBNs to acoustic modeling. On the standard TIMIT corpus, DBNs consistently outperform other techniques and the best DBN achieves a phone error rate (PER) of 23.0 % on the TIMIT core test set. 1
Multimodal learning with deep boltzmann machines
 In NIPS’2012
, 2012
"... Data often consists of multiple diverse modalities. For example, images are tagged with textual information and videos are accompanied by audio. Each modality is characterized by having distinct statistical properties. We propose a Deep Boltzmann Machine for learning a generative model of such mult ..."
Abstract

Cited by 77 (2 self)
 Add to MetaCart
(Show Context)
Data often consists of multiple diverse modalities. For example, images are tagged with textual information and videos are accompanied by audio. Each modality is characterized by having distinct statistical properties. We propose a Deep Boltzmann Machine for learning a generative model of such multimodal data. We show that the model can be used to create fused representations by combining features across modalities. These learned representations are useful for classification and information retrieval. By sampling from the conditional distributions over each data modality, it is possible to create these representations even when some data modalities are missing. We conduct experiments on bimodal imagetext and audiovideo data. The fused representation achieves good classification results on the MIRFlickr data set matching or outperforming other deep models as well as SVM based models that use Multiple Kernel Learning. We further demonstrate that this multimodal model helps classification and retrieval even when only unimodal data is available at test time.
Learning to Represent Spatial Transformations with Factored HigherOrder Boltzmann Machines
, 2010
"... To allow the hidden units of a restricted Boltzmann machine to model the transformation between two successive images, Memisevic and Hinton (2007) introduced threeway multiplicative interactions that use the intensity of a pixel in the first image as a multiplicative gain on a learned, symmetric we ..."
Abstract

Cited by 75 (18 self)
 Add to MetaCart
To allow the hidden units of a restricted Boltzmann machine to model the transformation between two successive images, Memisevic and Hinton (2007) introduced threeway multiplicative interactions that use the intensity of a pixel in the first image as a multiplicative gain on a learned, symmetric weight between a pixel in the second image and a hidden unit. This creates cubically many parameters, which form a threedimensional interaction tensor. We describe a lowrank approximation to this interaction tensor that uses a sum of factors, each of which is a threeway outer product. This approximation allows efficient learning of transformations between larger image patches. Since each factor can be viewed as an image filter, the model as a whole learns optimal filter pairs for efficiently representing transformations. We demonstrate the learning of optimal filter pairs from various synthetic and real image sequences. We also show how learning about image transformations allows the model to perform a simple visual analogy task, and we show how a completely unsupervised network trained on transformations perceives multiple motions of transparent dot patterns in the same way as humans.
3d object recognition with deep belief nets
 Advances in Neural Information Processing Systems 22
, 2009
"... We introduce a new type of toplevel model for Deep Belief Nets and evaluate it on a 3D object recognition task. The toplevel model is a thirdorder Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB d ..."
Abstract

Cited by 63 (8 self)
 Add to MetaCart
(Show Context)
We introduce a new type of toplevel model for Deep Belief Nets and evaluate it on a 3D object recognition task. The toplevel model is a thirdorder Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalizeduniform version), which contains stereopair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5 % error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has builtin knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semisupervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2 % error. 1
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix coclustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
Learning multiple layers of representations
 Trends in Cognitive Sciences 11:428–434
, 2007
"... To achieve its ’ impressive performance at tasks such as speech or object recognition, the brain extracts multiple levels of representation from the sensory input. Backpropagation was the first computationally efficient model of how neural networks could learn multiple layers of representation, but ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
(Show Context)
To achieve its ’ impressive performance at tasks such as speech or object recognition, the brain extracts multiple levels of representation from the sensory input. Backpropagation was the first computationally efficient model of how neural networks could learn multiple layers of representation, but it required labeled training data and it did not work well in deep networks. The limitations of backpropagation learning can now be overcome by using multilayer neural networks that contain topdown connections and training them to generate sensory data rather than to classify it. Learning multilayer generative models appears to be difficult, but a recent discovery makes it easy to learn nonlinear, distributed representations one layer at a time. The multiple layers of representation learned in this way can subsequently be finetuned to produce generative or discriminative models that work much better than previous approaches. Learning feature detectors
Unsupervised learning of image transformations
 IN COMPUTER VISION AND PATTERN RECOGNITION. IEEE COMPUTER SOCIETY
, 2007
"... We describe a probabilistic model for learning rich, distributed representations of image transformations. The basic model is defined as a gated conditional random field that is trained to predict transformations of its inputs using a factorial set of latent variables. Inference in the model consist ..."
Abstract

Cited by 50 (18 self)
 Add to MetaCart
We describe a probabilistic model for learning rich, distributed representations of image transformations. The basic model is defined as a gated conditional random field that is trained to predict transformations of its inputs using a factorial set of latent variables. Inference in the model consists in extracting the transformation, given a pair of images, and can be performed exactly and efficiently. We show that, when trained on natural videos, the model develops domain specific motion features, in the form of fields of locally transformed edge filters. When trained on affine, or more general, transformations of still images, the model develops codes for these transformations, and can subsequently perform recognition tasks that are invariant under these transformations. It can also fantasize new transformations on previously unseen images. We describe several variations of the basic model and provide experimental results that demonstrate its applicability to a variety of tasks.
The recurrent temporal restricted Boltzmann machine
 In NIPS’2008
, 2009
"... The Temporal Restricted Boltzmann Machine (TRBM) is a probabilistic model for sequences that is able to successfully model (i.e., generate nicelooking samples of) several very high dimensional sequences, such as motion capture data and the pixels of low resolution videos of balls bouncing in a box. ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
(Show Context)
The Temporal Restricted Boltzmann Machine (TRBM) is a probabilistic model for sequences that is able to successfully model (i.e., generate nicelooking samples of) several very high dimensional sequences, such as motion capture data and the pixels of low resolution videos of balls bouncing in a box. The major disadvantage of the TRBM is that exact inference is extremely hard, since even computing a Gibbs update for a single variable of the posterior is exponentially expensive. This difficulty has necessitated the use of a heuristic inference procedure, that nonetheless was accurate enough for successful learning. In this paper we introduce the Recurrent TRBM, which is a very slight modification of the TRBM for which exact inference is very easy and exact gradient learning is almost tractable. We demonstrate that the RTRBM is better than an analogous TRBM at generating motion capture and videos of bouncing balls. 1
Representational power of restricted boltzmann machines and deep belief networks.
 Neural Computation,
, 2008
"... Abstract Deep Belief Networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton et al., along with a greedy layerwise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a Restricted Bol ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
Abstract Deep Belief Networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton et al., along with a greedy layerwise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a Restricted Boltzmann Machine (RBM), used to represent one layer of the model. Restricted Boltzmann Machines are interesting because inference is easy in them, and because they have been successfully used as building blocks for training deeper models. We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions. We then study the question of whether DBNs with more layers are strictly more powerful in terms of representational power. This suggests a new and less greedy criterion for training RBMs within DBNs.