Results 1 - 10
of
73
Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850
, 2013
"... This paper shows how Long Short-term Memory recurrent neural net-works can be used to generate complex sequences with long-range struc-ture, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwrit-ing (where the data are ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
(Show Context)
This paper shows how Long Short-term Memory recurrent neural net-works can be used to generate complex sequences with long-range struc-ture, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwrit-ing (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles. 1
On the importance of initialization and momentum in deep learning
"... Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initializa ..."
Abstract
-
Cited by 55 (4 self)
- Add to MetaCart
Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and well-initialized networks perform markedly worse when the momentum is absent or poorly tuned. Our success training these models suggests that previous attempts to train deep and recurrent neural networks from random initializations have likely failed due to poor initialization schemes. Furthermore, carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods. 1.
STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS
, 2012
"... Statistical language models are crucial part of many successful applications, such as automatic speech recognition and statistical machine translation (for example well-known ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
Statistical language models are crucial part of many successful applications, such as automatic speech recognition and statistical machine translation (for example well-known
Deep visual-semantic alignments for generating image descriptions
, 2014
"... We present a model that generates natural language de-scriptions of images and their regions. Our approach lever-ages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between lan-guage and visual data. Our alignment model is based on a novel combinati ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
We present a model that generates natural language de-scriptions of images and their regions. Our approach lever-ages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between lan-guage and visual data. Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Multimodal Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate that our alignment model produces state of the art results in retrieval experiments on Flickr8K, Flickr30K and MSCOCO datasets. We then show that the generated descriptions significantly outperform retrieval baselines on both full images and on a new dataset of region-level annotations.
Joint learning of words and meaning representations for open-text semantic parsing
- In Proceedings of 15th International Conference on Artificial Intelligence and Statistics
, 2012
"... Open-text (or open-domain) semantic parsers are designed to interpret any statement in natural language by inferring a corresponding meaning representation (MR). Unfortunately, large scale systems cannot be easily machine-learned due to lack of directly supervised data. We propose here a method that ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
(Show Context)
Open-text (or open-domain) semantic parsers are designed to interpret any statement in natural language by inferring a corresponding meaning representation (MR). Unfortunately, large scale systems cannot be easily machine-learned due to lack of directly supervised data. We propose here a method that learns to assign MRs to a wide range of text (using a dictionary of more than 70,000 words, which are mapped to more than 40,000 entities) thanks to a training scheme that combines learning from knowledge bases (WordNet and ConceptNet) with learning from raw text. The model jointly learns representations of words, entities and MRs via a multi-task training process operating on these diverse sources of data. Hence, the system ends up providing methods for knowledge acquisition and word-sense disambiguation within the context of semantic parsing in a single elegant framework. Experiments on these various tasks indicate the promise of the approach. 1
Sequence transduction with recurrent neural networks
- in ICML 29
, 2012
"... Many machine learning tasks can be ex-pressed as the transformation—or transduc-tion—of input sequences into output se-quences: speech recognition, machine trans-lation, protein secondary structure predic-tion and text-to-speech to name but a few. One of the key challenges in sequence trans-duction ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
(Show Context)
Many machine learning tasks can be ex-pressed as the transformation—or transduc-tion—of input sequences into output se-quences: speech recognition, machine trans-lation, protein secondary structure predic-tion and text-to-speech to name but a few. One of the key challenges in sequence trans-duction is learning to represent both the in-put and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recur-rent neural networks (RNNs) are a power-ful sequence learning architecture that has proven capable of learning such representa-tions. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduc-tion. This is a severe limitation since finding the alignment is the most difficult aspect of many sequence transduction problems. In-deed, even determining the length of the out-put sequence is often challenging. This pa-per introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to trans-form any input sequence into any finite, dis-crete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus. 1.
How to construct deep recurrent neural networks
, 2014
"... In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find th ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhu-ber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conven-tional, shallow RNNs.
Training Recurrent Neural Networks
, 2013
"... Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging probl ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessian-free (HF) optimizer and show that it can train RNNs on tasks that have extreme long-range temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to character-level language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with long-term dependencies. This directly contradicts widespread beliefs about the inability of first-order methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization.