Results 1 - 10
of
34
A fast learning algorithm for deep belief nets
- Neural Computation
, 2006
"... We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a ..."
Abstract
-
Cited by 241 (40 self)
- Add to MetaCart
We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1
Fields of experts: A framework for learning image priors
- In CVPR
, 2005
"... We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov Random Field (MRF) models by learning potential functions over extended pixel neighborhood ..."
Abstract
-
Cited by 153 (3 self)
- Add to MetaCart
We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov Random Field (MRF) models by learning potential functions over extended pixel neighborhoods. Field potentials are modeled using a Products-of-Experts framework that exploits nonlinear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field of Experts model with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the model is trained on a generic image database and is not tuned toward a specific application, we obtain results that compete with and even outperform specialized techniques. 1.
Energy-based models for sparse overcomplete representations
- Journal of Machine Learning Research
, 2003
"... We present a new way of extending independent components analysis (ICA) to overcomplete representations. In contrast to the causal generative extensions of ICA which maintain marginal independence of sources, we define features as deterministic (linear) functions of the inputs. This assumption resul ..."
Abstract
-
Cited by 43 (13 self)
- Add to MetaCart
We present a new way of extending independent components analysis (ICA) to overcomplete representations. In contrast to the causal generative extensions of ICA which maintain marginal independence of sources, we define features as deterministic (linear) functions of the inputs. This assumption results in marginal dependencies among the features, but conditional independence of the features given the inputs. By assigning energies to the features a probability distribution over the input states is defined through the Boltzmann distribution. Free parameters of this model are trained using the contrastive divergence objective (Hinton, 2002). When the number of features is equal to the number of input dimensions this energy-based model reduces to noiseless ICA and we show experimentally that the proposed learning algorithm is able to perform blind source separation on speech data. In additional experiments we train overcomplete energy-based models to extract features from various standard data-sets containing speech, natural images, hand-written digits and faces.
Topographic product models applied to natural scene statistics
- Neural Computation
, 2005
"... We present an energy-based model that uses a product of generalised Student-t distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discus ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
We present an energy-based model that uses a product of generalised Student-t distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discuss complete and overcomplete models, and provide algorithms for training these models from data. Using patches of natural scenes we demonstrate that our approach represents a viable alternative to “independent components analysis ” as an interpretive model of biological visual systems. Although the two approaches are similar in flavor there are also important differences, particularly when the representations are overcomplete. By constraining the interactions within our model we are also able to study the topographic organization of Gabor-like receptive fields that are learned by our model. Finally, we discuss the relation of our new approach to previous work — in particular Gaussian Scale Mixture models, and variants of independent components analysis. 1
The rate adapting poisson model for information retrieval and object recognition
- In Proceedings of 23rd International Conference on Machine Learning (ICML’06
, 2006
"... Probabilistic modelling of text data in the bagof-words representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these models. We introduce an ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Probabilistic modelling of text data in the bagof-words representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these models. We introduce an alternative undirected graphical model suitable for modelling count data. This “Rate Adapting Poisson ” (RAP) model is shown to generate superior dimensionally reduced representations for subsequent retrieval or classification. Models are trained using contrastive divergence while inference of latent topical representations is efficiently achieved through a simple matrix multiplication.
A Computational Model of the Cerebral Cortex
, 2005
"... Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of assoc ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of associative recall, sequence prediction and pattern completion among other functions. Implementing such models using readily available computing clusters is now within the grasp of many labs and would provide scientists with the opportunity to experiment with both hard-wired connection schemes and structure-learning algorithms inspired by animal learning and developmental studies. While neural circuits involving structures external to the neocortex such as the thalamic nuclei are less well understood, the availability of a computational model on which to test hypotheses would likely accelerate our understanding of these circuits. Furthermore, the existence of an agreedupon cortical substrate would not only facilitate our understanding of the brain but enable researchers to combine lessons learned from biology with state-of-theart graphical-model and machine-learning techniques to design hybrid systems that combine the best of biological and traditional computing approaches.
Stacks of Convolutional Restricted Boltzmann Machines for Shift-Invariant Feature Learning
"... In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state-of-the-art on handwritten digit recognition and pedestrian detection. 1.
Fast Image Deconvolution using Hyper-Laplacian
"... The heavy-tailed distribution of gradients in natural scenes have proven effective priors for a range of problems such as denoising, deblurring and super-resolution. These distributions are well modeled by a hyper-Laplacian ( p(x) ∝ e −k|x|α) , typically with 0.5 ≤ α ≤ 0.8. However, the use of spar ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The heavy-tailed distribution of gradients in natural scenes have proven effective priors for a range of problems such as denoising, deblurring and super-resolution. These distributions are well modeled by a hyper-Laplacian ( p(x) ∝ e −k|x|α) , typically with 0.5 ≤ α ≤ 0.8. However, the use of sparse distributions makes the problem non-convex and impractically slow to solve for multi-megapixel images. In this paper we describe a deconvolution approach that is several orders of magnitude faster than existing techniques that use hyper-Laplacian priors. We adopt an alternating minimization scheme where one of the two phases is a non-convex problem that is separable over pixels. This per-pixel sub-problem may be solved with a lookup table (LUT). Alternatively, for two specific values of α, 1/2 and 2/3 an analytic solution can be found, by finding the roots of a cubic and quartic polynomial, respectively. Our approach (using either LUTs or analytic formulae) is able to deconvolve a 1 megapixel image in less than ∼3 seconds, achieving comparable quality to existing methods such as iteratively reweighted least squares (IRLS) that take ∼20 minutes. Furthermore, our method is quite general and can easily be extended to related image processing problems, beyond the deconvolution application demonstrated. 1
Learning high-order MRF priors of color images
- In ICML ’06: Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... In this paper, we use large neighborhood Markov random fields to learn rich prior models of color images. Our approach extends the monochromatic Fields of Experts model (Roth & Black, 2005a) to color images. In the Fields of Experts model, the curse of dimensionality due to very large clique sizes i ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In this paper, we use large neighborhood Markov random fields to learn rich prior models of color images. Our approach extends the monochromatic Fields of Experts model (Roth & Black, 2005a) to color images. In the Fields of Experts model, the curse of dimensionality due to very large clique sizes is circumvented by parameterizing the potential functions according to a product of experts. We introduce simplifications to the original approach by Roth and Black which allow us to cope with the increased clique size (typically 3x3x3 or 5x5x3 pixels) of color images. Experimental results are presented for image denoising which evidence improvements over state-of-the-art monochromatic image priors. 1.
Fields of experts for image-based rendering
- In Proc. BMVC
, 2006
"... Image priors for novel view synthesis have traditionally been non-parametric models based on large libraries of image patch exemplars, producing highquality results but making inference very slow. Recently a parametric framework, called Fields of Experts, has been proposed for image restoration that ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Image priors for novel view synthesis have traditionally been non-parametric models based on large libraries of image patch exemplars, producing highquality results but making inference very slow. Recently a parametric framework, called Fields of Experts, has been proposed for image restoration that promises to speed up inference dramatically. In this paper we apply Fields of Experts for the first time to the problem of novel view synthesis, posed as a Markov random field labelling problem with very large cliques. Additionally, we introduce to computer vision for the first time a new optimization algorithm from statistical physics which reaches better minima than the ICM and simulated annealing algorithms to which such large-clique problems have previously been restricted. 1

