Results 1  10
of
15
Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification
, 2015
"... Rectified activation units (rectifiers) are essential for stateoftheart neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
Rectified activation units (rectifiers) are essential for stateoftheart neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94 % top5 test error on the ImageNet 2012 classification dataset. This is a 26 % relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66 % [33]). To our knowledge, our result is the first1 to surpass the reported humanlevel performance (5.1%, [26]) on this dataset.
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
"... In this paper, we propose a novel neural network model called RNN Encoder– Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
In this paper, we propose a novel neural network model called RNN Encoder– Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder–Decoder as an additional feature in the existing loglinear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases. 1
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ArXiv eprints
, 2015
"... Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notorious ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training minibatch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a stateoftheart image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batchnormalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82 % top5 test error, exceeding the accuracy of human raters. 1.
Identifying and attacking the saddle point problem in highdimensional nonconvex optimization
 In NIPS
, 2014
"... optimization ..."
Existence of V (m, t) vectors
 J. Statist. Plann. Inference
"... Local resampling for patchbased texture synthesis in ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Local resampling for patchbased texture synthesis in
Unitary evolution recurrent neural networks. arXiv preprint arXiv:1511.06464,
, 2015
"... Abstract Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn longterm dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very longterm dependencies.
EXPLORING DATA AUGMENTATION FOR IMPROVED SINGING VOICE DETECTIONWITH NEURAL NETWORKS
"... In computer vision, stateoftheart object recognition systems rely on labelpreserving image transformations such as scaling and rotation to augment the training datasets. The additional training examples help the system to learn invariances that are difficult to build into the model, and improve ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In computer vision, stateoftheart object recognition systems rely on labelpreserving image transformations such as scaling and rotation to augment the training datasets. The additional training examples help the system to learn invariances that are difficult to build into the model, and improve generalization to unseen data. To the best of our knowledge, this approach has not been systematically explored for music signals. Using the problem of singing voice detection with neural networks as an example, we apply a range of labelpreserving audio transformations to assess their utility for music data augmentation. In line with recent research in speech recognition, we find pitch shifting to be the most helpful augmentation method. Combined with time stretching and random frequency filtering, we achieve a reduction in classification error between 10 and 30%, reaching the state of the art on two public datasets. We expect that audio data augmentation would yield significant gains for several other sequence labelling and event detection tasks in music information retrieval. 1.
DataEfficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models
"... Abstract Dataefficient reinforcement learning (RL) in continuous stateaction spaces using very highdimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixelstotorques problem, where an RL ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Dataefficient reinforcement learning (RL) in continuous stateaction spaces using very highdimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixelstotorques problem, where an RL agent learns a closedloop control policy ("torques") from pixel information only. We introduce a dataefficient, modelbased reinforcement learning algorithm that learns such a closedloop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a lowdimensional feature embedding of images jointly with a predictive model in this lowdimensional feature space. Joint learning is crucial for longterm predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closedloop control. Compared to stateoftheart RL methods for continuous states and actions, our approach learns quickly, scales to highdimensional state spaces, is lightweight and an important step toward fully autonomous endtoend learning from pixels to torques.
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks Venu Govindaraju
"... Abstract While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networksInternal Covariate Shiftthe current solution has certain drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training whi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networksInternal Covariate Shiftthe current solution has certain drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate due to shifting parameter values (especially during initial training epochs). Another fundamental problem with BN is that it cannot be used with batchsize 1 during training. We address these drawbacks of BN by proposing a nonadaptive normalization technique for removing covariate shift, that we call Normalization Propagation. Our approach does not depend on batch statistics, but rather uses a dataindependent parametric estimate of mean and standarddeviation in every layer thus being computationally faster compared with BN. We exploit the observation that the preactivation before Rectified Linear Units follow Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics for hidden layers.