Results 1  10
of
18
Speech recognition with deep recurrent neural networks
, 2013
"... Recurrent neural networks (RNNs) are a powerful model for sequential data. Endtoend training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the inputoutput alignment is unknown. The combination of these methods with the L ..."
Abstract

Cited by 104 (8 self)
 Add to MetaCart
(Show Context)
Recurrent neural networks (RNNs) are a powerful model for sequential data. Endtoend training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the inputoutput alignment is unknown. The combination of these methods with the Long Shortterm Memory RNN architecture has proved particularly fruitful, delivering stateoftheart results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained endtoend with suitable regularisation, we find that deep Long Shortterm Memory RNNs achieve a test set error of 17.7 % on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.
Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
, 2014
"... Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed r ..."
Abstract

Cited by 59 (5 self)
 Add to MetaCart
(Show Context)
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encodes a source sentence into a fixedlength vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixedlength vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing stateoftheart phrasebased system on the task of EnglishtoFrench translation. Furthermore, qualitative analysis reveals that the (soft)alignments found by the model agree well with our intuition. 1
Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850
, 2013
"... This paper shows how Long Shortterm Memory recurrent neural networks can be used to generate complex sequences with longrange structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
(Show Context)
This paper shows how Long Shortterm Memory recurrent neural networks can be used to generate complex sequences with longrange structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are realvalued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles. 1
On the importance of initialization and momentum in deep learning
"... Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a welldesigned random initializa ..."
Abstract

Cited by 55 (4 self)
 Add to MetaCart
(Show Context)
Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a welldesigned random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with longterm dependencies) to levels of performance that were previously achievable only with HessianFree optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and wellinitialized networks perform markedly worse when the momentum is absent or poorly tuned. Our success training these models suggests that previous attempts to train deep and recurrent neural networks from random initializations have likely failed due to poor initialization schemes. Furthermore, carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated secondorder methods. 1.
Highdimensional sequence transduction
 in ICASSP
, 2013
"... We investigate the problem of transforming an input sequence into a highdimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions give ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
We investigate the problem of transforming an input sequence into a highdimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The resulting method produces musically plausible transcriptions even under high levels of noise and drastically outperforms previous stateoftheart approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate. Index Terms — Sequence transduction, restricted Boltzmann machine, recurrent neural network, polyphonic transcription 1.
Normalizing tweets with edit scripts and recurrent neural embeddings
"... Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and other noncanonical language. These features are problematic for standard language analysis tools and it can be desirable to convert them to canonical form. We propose a novel text normalization model ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and other noncanonical language. These features are problematic for standard language analysis tools and it can be desirable to convert them to canonical form. We propose a novel text normalization model based on learning edit operations from labeled data while incorporating features induced from unlabeled data via characterlevel neural text embeddings. The text embeddings are generated using an Simple Recurrent Network. We find that enriching the feature set with text embeddings substantially lowers word error rates on an English tweet normalization dataset. Our model improves on stateoftheart with little training data and without any lexical resources. 1
Learning Input and Recurrent Weight Matrices in Echo State Networks
"... Abstract The traditional echo state network (ESN) is a special type of a temporally deep model, the recurrent network (RNN), which carefully designs the recurrent matrix and fixes both the recurrent and input matrices in the RNN. The ESN also adopts the linear output (or readout) units to simplify ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract The traditional echo state network (ESN) is a special type of a temporally deep model, the recurrent network (RNN), which carefully designs the recurrent matrix and fixes both the recurrent and input matrices in the RNN. The ESN also adopts the linear output (or readout) units to simplify the leanring of the only output matrix in the RNN. In this paper, we devise a special technique that takes advantage of the linearity in the output units in the ESN to learn the input and recurrent matrices, not carried on earlier ESNs due to the wellknown difficulty of their learning. Compared with the technique of BackProp Through Time (BPTT) in learning the general RNNs, our proposed technique makes use of the linearity in the output units to provide constraints among various matrices in the RNN, enabling the computation of the gradients as the learning signal in an analytical form instead of by recursion as in the BPTT. Experimental results on phone state classification show that learning either or both the input and recurrent matrices in the ESN is superior to the traditional ESN without learning them, especially when longer time steps are used in analytically computing the gradients.
A primaldual method for training recurrent neural networks constrained by the echostate property
 In ICLR
"... We present an architecture of a recurrent neural network (RNN) with a fullyconnected deep neural network (DNN) as its feature extractor. The RNN is equipped with both causal temporal prediction and noncausal lookahead, via autoregression (AR) and movingaverage (MA), respectively. The focus of t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We present an architecture of a recurrent neural network (RNN) with a fullyconnected deep neural network (DNN) as its feature extractor. The RNN is equipped with both causal temporal prediction and noncausal lookahead, via autoregression (AR) and movingaverage (MA), respectively. The focus of this paper is a primaldual training method that formulates the learning of the RNN as a formal optimization problem with an inequality constraint that provides a sufficient condition for the stability of the network dynamics. Experimental results demonstrate the effectiveness of this new method, which achieves 18.86 % phone recognition error on the TIMIT benchmark for the core test set. The result approaches the best result of 17.7%, which was obtained by using RNN with long shortterm memory (LSTM). The results also show that the proposed primaldual training method produces lower recognition errors than the popular RNN methods developed earlier based on the carefully tuned threshold parameter that heuristically prevents the gradient from exploding. 1
An Online SequencetoSequence Model Using Partial Conditioning
"... Abstract Sequencetosequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output s ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Sequencetosequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make incremental predictions as more input arrives, without redoing the entire computation. Unlike sequencetosequence models, the Neural Transducer computes the nextstep distribution conditioned on the partially observed input sequence and the partially generated sequence. At each time step, the transducer can decide to emit zero to many output symbols. The data can be processed using an encoder and presented as input to the transducer. The discrete decision to emit a symbol at every time step makes it difficult to learn with conventional backpropagation. It is however possible to train the transducer by using a dynamic programming algorithm to generate target discrete decisions. Our experiments show that the Neural Transducer works well in settings where it is required to produce output predictions as data come in. We also find that the Neural Transducer performs well for long sequences even when attention mechanisms are not used.