Results 1  10
of
50
A fast learning algorithm for deep belief nets
 Neural Computation
, 2006
"... We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a ..."
Abstract

Cited by 445 (48 self)
 Add to MetaCart
We show how to use “complementary priors ” to eliminate the explaining away effects that make inference difficult in denselyconnected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that finetunes the weights using a contrastive version of the wakesleep algorithm. After finetuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The lowdimensional manifolds on which the digits lie are modelled by long ravines in the freeenergy landscape of the toplevel associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1
Fields of experts: A framework for learning image priors
 In CVPR
, 2005
"... We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov Random Field (MRF) models by learning potential functions over extended pixel neighborhood ..."
Abstract

Cited by 229 (3 self)
 Add to MetaCart
We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov Random Field (MRF) models by learning potential functions over extended pixel neighborhoods. Field potentials are modeled using a ProductsofExperts framework that exploits nonlinear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field of Experts model with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the model is trained on a generic image database and is not tuned toward a specific application, we obtain results that compete with and even outperform specialized techniques. 1.
Energybased models for sparse overcomplete representations
 Journal of Machine Learning Research
, 2003
"... We present a new way of extending independent components analysis (ICA) to overcomplete representations. In contrast to the causal generative extensions of ICA which maintain marginal independence of sources, we define features as deterministic (linear) functions of the inputs. This assumption resul ..."
Abstract

Cited by 51 (14 self)
 Add to MetaCart
We present a new way of extending independent components analysis (ICA) to overcomplete representations. In contrast to the causal generative extensions of ICA which maintain marginal independence of sources, we define features as deterministic (linear) functions of the inputs. This assumption results in marginal dependencies among the features, but conditional independence of the features given the inputs. By assigning energies to the features a probability distribution over the input states is defined through the Boltzmann distribution. Free parameters of this model are trained using the contrastive divergence objective (Hinton, 2002). When the number of features is equal to the number of input dimensions this energybased model reduces to noiseless ICA and we show experimentally that the proposed learning algorithm is able to perform blind source separation on speech data. In additional experiments we train overcomplete energybased models to extract features from various standard datasets containing speech, natural images, handwritten digits and faces.
Topographic product models applied to natural scene statistics
 Neural Computation
, 2005
"... We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discus ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discuss complete and overcomplete models, and provide algorithms for training these models from data. Using patches of natural scenes we demonstrate that our approach represents a viable alternative to “independent components analysis ” as an interpretive model of biological visual systems. Although the two approaches are similar in flavor there are also important differences, particularly when the representations are overcomplete. By constraining the interactions within our model we are also able to study the topographic organization of Gaborlike receptive fields that are learned by our model. Finally, we discuss the relation of our new approach to previous work — in particular Gaussian Scale Mixture models, and variants of independent components analysis. 1
Fast Image Deconvolution using HyperLaplacian
"... The heavytailed distribution of gradients in natural scenes have proven effective priors for a range of problems such as denoising, deblurring and superresolution. These distributions are well modeled by a hyperLaplacian ( p(x) ∝ e −kxα) , typically with 0.5 ≤ α ≤ 0.8. However, the use of spar ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
The heavytailed distribution of gradients in natural scenes have proven effective priors for a range of problems such as denoising, deblurring and superresolution. These distributions are well modeled by a hyperLaplacian ( p(x) ∝ e −kxα) , typically with 0.5 ≤ α ≤ 0.8. However, the use of sparse distributions makes the problem nonconvex and impractically slow to solve for multimegapixel images. In this paper we describe a deconvolution approach that is several orders of magnitude faster than existing techniques that use hyperLaplacian priors. We adopt an alternating minimization scheme where one of the two phases is a nonconvex problem that is separable over pixels. This perpixel subproblem may be solved with a lookup table (LUT). Alternatively, for two specific values of α, 1/2 and 2/3 an analytic solution can be found, by finding the roots of a cubic and quartic polynomial, respectively. Our approach (using either LUTs or analytic formulae) is able to deconvolve a 1 megapixel image in less than ∼3 seconds, achieving comparable quality to existing methods such as iteratively reweighted least squares (IRLS) that take ∼20 minutes. Furthermore, our method is quite general and can easily be extended to related image processing problems, beyond the deconvolution application demonstrated. 1
A computational model of the cerebral cortex
 In Proceedings of AAAI05, 938–943
, 2005
"... Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of associativ ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Our current understanding of the primate cerebral cortex (neocortex) and in particular the posterior, sensory association cortex has matured to a point where it is possible to develop a family of graphical models that capture the structure, scale and power of the neocortex for purposes of associative recall, sequence prediction and pattern completion among other functions. Implementing such models using readily available computing clusters is now within the grasp of many labs and would provide scientists with the opportunity to experiment with both hardwired connection schemes and structurelearning algorithms inspired by animal learning and developmental studies. While neural circuits involving structures external to the neocortex such as the thalamic nuclei are less well understood, the availability of a computational model on which to test hypotheses would likely accelerate our understanding of these circuits. Furthermore, the existence of an agreedupon cortical substrate would not only facilitate our understanding of the brain but enable researchers to combine lessons learned from biology with stateoftheart graphicalmodel and machinelearning techniques to design hybrid systems that combine the best of biological and traditional computing approaches.
The rate adapting poisson model for information retrieval and object recognition
 In Proceedings of 23rd International Conference on Machine Learning (ICML’06
, 2006
"... Probabilistic modelling of text data in the bagofwords representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these models. We introduce an ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Probabilistic modelling of text data in the bagofwords representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these models. We introduce an alternative undirected graphical model suitable for modelling count data. This “Rate Adapting Poisson ” (RAP) model is shown to generate superior dimensionally reduced representations for subsequent retrieval or classification. Models are trained using contrastive divergence while inference of latent topical representations is efficiently achieved through a simple matrix multiplication.
Stacks of Convolutional Restricted Boltzmann Machines for ShiftInvariant Feature Learning
"... In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (CRBM), a ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (CRBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate CRBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the stateoftheart on handwritten digit recognition and pedestrian detection. 1.
Modeling multiscale subbands of photographic . . .
, 2009
"... The local statistical properties of photographic images, when represented in a multiscale basis, have been described using Gaussian scale mixtures. Here, we use this local description as a substrate for constructing a global field of Gaussian scale mixtures (FoGSM). Specifically, we model multiscal ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
The local statistical properties of photographic images, when represented in a multiscale basis, have been described using Gaussian scale mixtures. Here, we use this local description as a substrate for constructing a global field of Gaussian scale mixtures (FoGSM). Specifically, we model multiscale subbands as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for this model is feasible and that samples drawn from a FoGSM model have marginal and joint statistics similar to those of the subband coefficients of photographic images. We develop an algorithm for removing additive white Gaussian noise based on the FoGSM model and demonstrate denoising performance comparable with stateoftheart methods.
Fields of experts for imagebased rendering
 In Proc. BMVC
, 2006
"... Image priors for novel view synthesis have traditionally been nonparametric models based on large libraries of image patch exemplars, producing highquality results but making inference very slow. Recently a parametric framework, called Fields of Experts, has been proposed for image restoration that ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Image priors for novel view synthesis have traditionally been nonparametric models based on large libraries of image patch exemplars, producing highquality results but making inference very slow. Recently a parametric framework, called Fields of Experts, has been proposed for image restoration that promises to speed up inference dramatically. In this paper we apply Fields of Experts for the first time to the problem of novel view synthesis, posed as a Markov random field labelling problem with very large cliques. Additionally, we introduce to computer vision for the first time a new optimization algorithm from statistical physics which reaches better minima than the ICM and simulated annealing algorithms to which such largeclique problems have previously been restricted. 1