Results 1 - 10
of
25
Using Fast Weights to Improve Persistent Contrastive Divergence
"... The most commonly used learning algorithm for restricted Boltzmann machines is contrastive divergence which starts a Markov chain at a data point and runs the chain for only a few iterations to get a cheap, low variance estimate of the sufficient statistics under the model. Tieleman (2008) showed th ..."
Abstract
-
Cited by 25 (12 self)
- Add to MetaCart
The most commonly used learning algorithm for restricted Boltzmann machines is contrastive divergence which starts a Markov chain at a data point and runs the chain for only a few iterations to get a cheap, low variance estimate of the sufficient statistics under the model. Tieleman (2008) showed that better learning can be achieved by estimating the model’s statistics using a small set of persistent ”fantasy particles ” that are not reinitialized to data points after each weight update. With sufficiently small weight updates, the fantasy particles represent the equilibrium distribution accurately but to explain why the method works with much larger weight updates it is necessary to consider the interaction between the weight updates and the Markov chain. We show that the weight updates force the Markov chain to mix fast, and using this insight we develop an even faster mixing chain that uses an auxiliary set of ”fast weights ” to implement a temporary overlay on the energy landscape. The fast weights learn rapidly but also decay rapidly and do not contribute to the normal energy landscape that defines the model. 1.
3-d object recognition with deep belief nets
- Advances in Neural Information Processing Systems 22
, 2009
"... We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB d ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5 % error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2 % error. 1
Stacks of Convolutional Restricted Boltzmann Machines for Shift-Invariant Feature Learning
"... In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state-of-the-art on handwritten digit recognition and pedestrian detection. 1.
On Optimization Methods for Deep Learning
"... The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms. In our experiments, the difference between L-BFGS/CG and SGDs are more pronounced if we consider algorithmic extensions (e.g., sparsity regularization) and hardware extensions (e.g., GPUs or computer clusters). Our experiments with distributed optimization support the use of L-BFGS with locally connected networks and convolutional neural networks. Using L-BFGS, our convolutional network model achieves 0.69 % on the standard MNIST dataset. This is a state-of-theart result on MNIST among algorithms that do not use distortions or pretraining. 1.
On Deep Generative Models with Applications to Recognition
"... The most popular way to use probabilistic models in vision is first to extract some descriptors of small image patches or object parts using well-engineered features, and then to use statistical learning tools to model the dependencies among these features and eventual labels. Learning probabilistic ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The most popular way to use probabilistic models in vision is first to extract some descriptors of small image patches or object parts using well-engineered features, and then to use statistical learning tools to model the dependencies among these features and eventual labels. Learning probabilistic models directly on the raw pixel values has proved to be much more difficult and is typically only used for regularizing discriminative methods. In this work, we use one of the best, pixel-level, generative models of natural images – a gated MRF – as the lowest level of a deep belief network (DBN) that has several hidden layers. We show that the resulting DBN is very good at coping with occlusion when predicting expression categories from face images, and it can produce features that perform comparably to SIFT descriptors for discriminating different types of scene. The generative ability of the model also makes it easy to see what information is captured and what is lost at each level of representation. 1. Introduction and Previous
Online heuristic selection in constraint programming
- In Symposiumon Combinatorial Search (SoCS
, 2009
"... This paper presents our first attempt to apply Support Vector Machines to the problem of automatically tuning CP search algorithms. More precisely, we exploit instances features to dynamically adapt the search strategy of a CP solver in order to more efficiently solve a given instance. In these prel ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper presents our first attempt to apply Support Vector Machines to the problem of automatically tuning CP search algorithms. More precisely, we exploit instances features to dynamically adapt the search strategy of a CP solver in order to more efficiently solve a given instance. In these preliminary results, adaptation is restricted to restart points, and the number of times the strategy changes is also restricted. We report very encouraging results where our adaptation outperforms what is currently considered as one of the state of the art dynamic variable selection strategy. 1
Learning to combine foveal glimpses with a third-order Boltzmann machine
"... We describe a model based on a Boltzmann machine with third-order connections that can learn how to accumulate information about a shape over several fixations. The model uses a retina that only has enough high resolution pixels to cover a small area of the image, so it must decide on a sequence of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We describe a model based on a Boltzmann machine with third-order connections that can learn how to accumulate information about a shape over several fixations. The model uses a retina that only has enough high resolution pixels to cover a small area of the image, so it must decide on a sequence of fixations and it must combine the “glimpse ” at each fixation with the location of the fixation before integrating the information with information from other glimpses of the same object. We evaluate this model on a synthetic dataset and two image classification datasets, showing that it can perform at least as well as a model trained on whole images. 1
Discriminative k-metrics
"... The kq-flats algorithm is a generalization of the popular k-means algorithm where q dimensional best fit affine sets replace centroids as the cluster prototypes. In this work, a modification of the k q-flats framework for pattern classification is introduced. The basic idea is to replace the origina ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The kq-flats algorithm is a generalization of the popular k-means algorithm where q dimensional best fit affine sets replace centroids as the cluster prototypes. In this work, a modification of the k q-flats framework for pattern classification is introduced. The basic idea is to replace the original reconstruction only energy, which is optimized to obtain the k affine spaces, by a new energy that incorporates discriminative terms. This way, the actual classification task is introduced as part of the design and optimization. The presentation of the proposed framework is complemented with experimental results, showing that the method is computationally very efficient and gives excellent results on standard supervised learning benchmarks.
Parallelizable Sampling of Markov Random Fields
"... Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification, denoising, and for constructing Deep Belief Networks. Every application of an MRF requires addressing its inference problem, which can be done using deterministic inferen ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification, denoising, and for constructing Deep Belief Networks. Every application of an MRF requires addressing its inference problem, which can be done using deterministic inference methods or using stochastic Markov Chain Monte Carlo methods. In this paper we introduce a new Markov Chain transition operator that updates all the variables of a pairwise MRF in parallel by using auxiliary Gaussian variables. The proposed MCMC operator is extremely simple to implement and to parallelize. This is achieved by a formal equivalence result between arbitrary pairwise MRFs and a particular type of Restricted Boltzmann Machine. This result also implies that the later can be learned in place of the former without any loss of modeling power, a possibility we explore in experiments. 1

