Results 1  10
of
50
Maxout networks
 In ICML
, 2013
"... We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to bot ..."
Abstract

Cited by 68 (17 self)
 Add to MetaCart
(Show Context)
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout’s fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR10, CIFAR100, and SVHN.
Regularization of Neural Networks using DropConnect
"... We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regularizing large fullyconnected layers within neural networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropConnect instead sets a randomly selected sub ..."
Abstract

Cited by 63 (3 self)
 Add to MetaCart
(Show Context)
We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regularizing large fullyconnected layers within neural networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropConnect instead sets a randomly selected subset of weights within the network to zero. Each unit thus receives input from a random subset of units in the previous layer. We derive a bound on the generalization performance of both Dropout and DropConnect. We then evaluate DropConnect on a range of datasets, comparing to Dropout, and show stateoftheart results on several image recognition benchmarks by aggregating multiple DropConnecttrained models. 1.
Dropout: A simple way to prevent neural networks from overfitting
 Journal of Machine Learning Research
, 1929
"... Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural net ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
(Show Context)
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from coadapting too much. During training, dropout samples from an exponential number of different “thinned ” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining stateoftheart results on many benchmark data sets.
Stochastic backpropagation and approximate inference in deep generative models
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distri ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation – rules for gradient backpropagation through stochastic variables – and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several realworld data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for highdimensional data visualisation. 1.
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks. ArXiv eprints, January 2013. Ping Zhong and Runsheng Wang. Using combination of statistical models and multilevel structural information for detecting urban areas from a single
"... We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
(Show Context)
We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyperparameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve stateoftheart performance on four image datasets, relative to other approaches that do not utilize data augmentation. 1
Convolutional neural networks applied to house numbers digit classification
 In ICPR
, 2012
"... Abstract We classify digits of realworld house numbers using convolutional neural networks (ConvNets ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
Abstract We classify digits of realworld house numbers using convolutional neural networks (ConvNets
Multitask Bayesian optimization
 In: Proceedings of NIPS; 2013
"... (Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
(Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.
EndtoEnd Text Recognition with Convolutional Neural Networks
"... Full endtoend text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully handengineered features or large amounts of prior knowledge. In this paper, we take a differ ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Full endtoend text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully handengineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highlyaccurate text detector and character recognizer modules. Then, using only simple offtheshelf methods, we integrate these two modules into a full endtoend, lexicondriven, scene text recognition system that achieves stateoftheart performance on standard benchmarks, namely Street View Text and ICDAR 2003. 1
Improving Neural Networks with Dropout
"... Deep neural nets with a huge number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining many different large neural nets at test time. Drop ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
Deep neural nets with a huge number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from a neural network during training. This prevents the units from coadapting too much. Dropping units creates thinned networks during training. The number of possible thinned networks is exponential in the number of units in the network. At test time all possible thinned networks are combined using an approximate model averaging procedure. Dropout training followed by this approximate model combination significantly reduces overfitting and gives major improvements over other regularization methods. In this work, we describe models that improve the performance of neural networks using dropout, often obtaining stateoftheart results on benchmark datasets.