Results 1  10
of
24
Deep Generative Stochastic Networks Trainable by Backprop
, 2013
"... We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Be ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here generalize recent work on the probabilistic interpretation of denoising autoencoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining. 1
Generative adversarial nets
 In NIPS
, 2014
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle
Learning to Generate Chairs with Convolutional Neural Networks
"... We train a generative convolutional neural network which is able to generate images of objects given object type, viewpoint, and color. We train the network in a supervised manner on a dataset of rendered 3D chair models. Our experiments show that the network does not merely learn all images by he ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We train a generative convolutional neural network which is able to generate images of objects given object type, viewpoint, and color. We train the network in a supervised manner on a dataset of rendered 3D chair models. Our experiments show that the network does not merely learn all images by heart, but rather finds a meaningful representation of a 3D chair model allowing it to assess the similarity of different chairs, interpolate between given viewpoints to generate the missing ones, or invent new chair styles by interpolating between chairs from the training set. We show that the network can be used to find correspondences between different chairs from the dataset, outperforming existing approaches on this task. 1.
DRAW: A recurrent neural network for image generation
 CoRR
, 2015
"... This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational autoencoding framework that allows for the iterativ ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational autoencoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye. 1.
Doubly stochastic variational Bayes for nonconjugate inference
"... We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient informati ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression. 1.
Generative Moment Matching Networks
"... We consider the problem of learning deep generative models from data. We formulate a method that generates an independent sample via a single feedforward pass through a multilayer preceptron, as in the recently proposed generative adversarial networks (Goodfellow et al., 2014). Training a generat ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning deep generative models from data. We formulate a method that generates an independent sample via a single feedforward pass through a multilayer preceptron, as in the recently proposed generative adversarial networks (Goodfellow et al., 2014). Training a generative adversarial network, however, requires careful optimization of a difficult minimax program. Instead, we utilize a technique from statistical hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simple objective that can be interpreted as matching all orders of statistics between a dataset and samples from the model, and can be trained by backpropagation. We further boost the performance of this approach by combining our generative network with an autoencoder network, using MMD to learn to generate codes that can then be decoded to produce samples. We show that the combination of these techniques yields excellent generative models compared to baseline approaches as measured on MNIST and the Toronto Face Database. 1.
Markov chain monte carlo and variational inference: Bridging the gap. arXiv preprint arXiv:1410.6460
, 2014
"... Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we i ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results. 1. MCMC and Variational Inference Bayesian analysis gives us a very simple recipe for learning from data: given a set of unknown parameters or latent variables z that are of interest, we specify a prior distribution p(z) quantifying what we know about z before observing any data. Then we quantify how the observed data x relates to z by specifying a likelihood function p(xz). Finally, we apply Bayes ’ rule p(zx) = p(z)p(xz) / R p(z)p(xz)dz to give the posterior distribution, which quantifies what we know about z after seeing the data. Although this recipe is very simple conceptually, the implied computation is often intractable. We therefore need to resort to approximation methods in order to perform Bayesian inference in practice. The two most popular ap
How autoencoders could provide credit assignment in deep networks via target propagation
, 2014
"... We propose to exploit reconstruction as a layerlocal training signal for deep learning. Reconstructions can be propagated in a form of target propagation playing a role similar to backpropagation but helping to reduce the reliance on derivatives in order to perform credit assignment across many l ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We propose to exploit reconstruction as a layerlocal training signal for deep learning. Reconstructions can be propagated in a form of target propagation playing a role similar to backpropagation but helping to reduce the reliance on derivatives in order to perform credit assignment across many levels of possibly strong nonlinearities (which is difficult for backpropagation). A regularized autoencoder tends produce a reconstruction that is a more likely version of its input, i.e., a small move in the direction of higher likelihood. By generalizing gradients, target propagation may also allow to train deep networks with discrete hidden units. If the autoencoder takes both a representation of input and target (or of any side information) in input, then its reconstruction of input representation provides a target towards a representation that is more likely, conditioned on all the side information. A deep autoencoder decoding path generalizes gradient propagation in a learned way that can could thus handle not just infinitesimal changes but larger, discrete changes, hopefully allowing credit assignment through a long chain of nonlinear operations. In addition to each layer being a good autoencoder, the encoder also learns to please the upper layers by transforming the data into a space where it is easier to model by them, flattening manifolds and disentangling factors. The motivations and theoretical justifications for this approach are laid down in this paper, along with conjectures that will have to be verified either mathematically or experimentally, including a hypothesis stating that such autoencoder mediated target propagation could play in brains the role of credit assignment through many nonlinear, noisy and discrete transformations. 1
GSNs: Generative Stochastic Networks
"... We introduce a novel training principle for generative probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework generalizes Denoising AutoEncoders (DAE) and is based on learning the transition operator of a Markov chain whose s ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce a novel training principle for generative probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework generalizes Denoising AutoEncoders (DAE) and is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. The transition distribution is a conditional distribution that generally involves a small move, so it has fewer dominant modes and is unimodal in the limit of small moves. This simplifies the learning problem, making it less like density estimation and more akin to supervised function approximation, with gradients that can be obtained by backprop. The theorems provided here provide a probabilistic interpretation for denoising autoencoders and generalize them; seen in the context of this framework, autoencoders that learn with injected noise are a special case of GSNs and can be interpreted as generative models. The theorems also provide an interesting justification for dependency networks and generalized pseudolikelihood and define an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent. GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Experiments validating these theoretical results are conducted on both synthetic datasets and image datasets. The experiments employ a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but that allows training to proceed with backprop through a recurrent neural network with noise injected inside and without the need for layerwise pretraining. 1.
Deep Exponential Families
"... We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We per ..."
Abstract
 Add to MetaCart
(Show Context)
We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We perform inference using recent “black box” variational inference techniques. We then evaluate various DEFs on text and combine multiple DEFs into a model for pairwise recommendation data. In an extensive study, we show going beyond one layer improves predictions for DEFs. We demonstrate that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than stateoftheart models. 1