Results 1 - 10
of
17
Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
, 2014
"... Neural machine translation is a recently proposed approach to machine transla-tion. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed r ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
(Show Context)
Neural machine translation is a recently proposed approach to machine transla-tion. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architec-ture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition. 1
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
"... In this paper, we propose a novel neu-ral network model called RNN Encoder– Decoder that consists of two recurrent neural networks (RNN). One RNN en-codes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another se-quence of symbols. The ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
In this paper, we propose a novel neu-ral network model called RNN Encoder– Decoder that consists of two recurrent neural networks (RNN). One RNN en-codes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another se-quence of symbols. The encoder and de-coder of the proposed model are jointly trained to maximize the conditional prob-ability of a target sequence given a source sequence. The performance of a statisti-cal machine translation system is empiri-cally found to improve by using the con-ditional probabilities of phrase pairs com-puted by the RNN Encoder–Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases. 1
Singing-voice separation from monaural recordings using robust principal component analysis
- In ICASSP
, 2012
"... Separating singing voices from music accompaniment is an important task in many applications, such as music information retrieval, lyric recognition and alignment. Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure; on the other hand, singing voices ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
(Show Context)
Separating singing voices from music accompaniment is an important task in many applications, such as music information retrieval, lyric recognition and alignment. Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure; on the other hand, singing voices can be regarded as relatively sparse within songs. In this paper, based on this assumption, we propose using robust principal component analysis for singing-voice separation from music accompaniment. Moreover, we examine the separation result by using a binary time-frequency masking method. Evaluations on the MIR-1K dataset show that this method can achieve around 1∼1.4 dB higher GNSDR compared with two state-of-the-art approaches without using prior training or requiring particular features.
On Fast Dropout and its Applicability to Recurrent Networks
"... Recurrent Neural Networks (RNNs) are rich models for the processing of sequen-tial data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfi ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
Recurrent Neural Networks (RNNs) are rich models for the processing of sequen-tial data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen con-siderably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout imple-ments a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predic-tions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One conse-quence of this is the absence of a global weight attractor, which is particularly appealing for RNNs, since the dynamics are not biased towards a certain regime. We positively test the hypothesis that this improves the performance of RNNs on four musical data sets. 1
Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition,” arXiv preprint arXiv:1410.4281
, 2014
"... ar ..."
(Show Context)
CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS
"... ABSTRACT Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing freque ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
ABSTRACT Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable space. In this paper, we take advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture. We explore the proposed architecture, which we call CLDNN, on a variety of large vocabulary tasks, varying from 200 to 2,000 hours. We find that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.
Toward deep learning software repositories
- in Proceedings of the 12th Working Conference on Mining Software Repositories
, 2015
"... Abstract—Deep learning subsumes algorithms that automat-ically learn compositional representations. The ability of these models to generalize well has ushered in tremendous advances in many fields such as natural language processing (NLP). Recent research in the software engineering (SE) community h ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—Deep learning subsumes algorithms that automat-ically learn compositional representations. The ability of these models to generalize well has ushered in tremendous advances in many fields such as natural language processing (NLP). Recent research in the software engineering (SE) community has demonstrated the usefulness of applying NLP techniques to software corpora. Hence, we motivate deep learning for software language modeling, highlighting fundamental differences between state-of-the-practice software language models and connectionist models. Our deep learning models are applicable to source code files (since they only require lexically analyzed source code written in any programming language) and other types of artifacts. We show how a particular deep learning model can remember its state to effectively model sequential data, e.g., streaming software tokens, and the state is shown to be much more expressive than discrete tokens in a prefix. Then we instantiate deep learning models and show that deep learning induces high-quality models compared to n-grams and cache-based n-grams on a corpus of Java projects. We experiment with two of the models ’ hyperparameters, which govern their capacity and the amount of context they use to inform predictions, before building several committees of software language models to aid generalization. Then we apply the deep learning models to code suggestion and demonstrate their effectiveness at a real SE task compared to state-of-the-practice models. Finally, we propose avenues for future work, where deep learning can be brought to bear to support model-based testing, improve software lexicons, and conceptualize software artifacts. Thus, our work serves as the first step toward deep learning software repositories. Keywords—Software repositories, machine learning, deep learn-ing, software language models, n-grams, neural networks I.
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
"... Abstract—Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve 2.30–4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30– 2.48 dB GNSDR gain and 4.32–5.42 dB GSIR gain compared to existing models in the singing voice separation task, and outperform NMF and DNN baselines in the speech denoising task.
Learned-norm pooling for deep feedforward and recurrent neural networks
, 2013
"... Abstract. In this paper we propose and investigate a novel nonlinear unit, called Lp unit, for deep neural networks. The proposed Lp unit receives signals from several projections of a subset of units in the layer below and computes a normalized Lp norm. We notice two interesting interpretations of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper we propose and investigate a novel nonlinear unit, called Lp unit, for deep neural networks. The proposed Lp unit receives signals from several projections of a subset of units in the layer below and computes a normalized Lp norm. We notice two interesting interpretations of the Lp unit. First, the proposed unit can be understood as a generalization of a number of conventional pooling operators such as average, root-mean-square and max pooling widely used in, for instance, convolutional neural networks (CNN), HMAX models and neocognitrons. Furthermore, the Lp unit is, to a certain degree, similar to the recently proposed maxout unit [13] which achieved the state-of-the-art object recognition results on a number of benchmark datasets. Secondly, we provide a geometrical interpretation of the activation function based on which we argue that the Lp unit is more efficient at representing complex, nonlinear separating boundaries. Each Lp unit defines a superelliptic boundary, with its exact shape defined by the order p. We claim that this makes it possible to model arbitrarily shaped, curved boundaries more efficiently by combining a few Lp units of different orders. This insight justifies the need for learning different orders for each unit in the model. We empirically evaluate the proposed Lp units on a number of datasets and show that multilayer perceptrons (MLP) consisting of the Lp units achieve the state-of-the-art results on a number of benchmark datasets. Furthermore, we evaluate the proposed Lp unit on the recently proposed deep recurrent neural networks (RNN).