Results 1 - 10
of
43
Sequence to sequence learning with neural networks
- in Advances in Neural Information Processing Systems, 2014
"... Deep Neural Networks (DNNs) are powerful models that have achieved excel-lent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approac ..."
Abstract
-
Cited by 76 (7 self)
- Add to MetaCart
(Show Context)
Deep Neural Networks (DNNs) are powerful models that have achieved excel-lent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-TermMemory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM’s BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Fi-nally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM’s performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. 1
Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
, 2014
"... Neural machine translation is a recently proposed approach to machine transla-tion. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed r ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
Neural machine translation is a recently proposed approach to machine transla-tion. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architec-ture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition. 1
A Fast and Accurate Dependency Parser using Neural Networks
"... Almost all current dependency parsers classify based on millions of sparse indi-cator features. Not only do these features generalize poorly, but the cost of feature computation restricts parsing speed signif-icantly. In this work, we propose a novel way of learning a neural network classifier for u ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
(Show Context)
Almost all current dependency parsers classify based on millions of sparse indi-cator features. Not only do these features generalize poorly, but the cost of feature computation restricts parsing speed signif-icantly. In this work, we propose a novel way of learning a neural network classifier for use in a greedy, transition-based depen-dency parser. Because this classifier learns and uses just a small number of dense fea-tures, it can work very fast, while achiev-ing an about 2 % improvement in unla-beled and labeled attachment scores on both English and Chinese datasets. Con-cretely, our parser is able to parse more than 1000 sentences per second at 92.2% unlabeled attachment score on the English Penn Treebank. 1
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
"... In this paper, we propose a novel neu-ral network model called RNN Encoder– Decoder that consists of two recurrent neural networks (RNN). One RNN en-codes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another se-quence of symbols. The ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
In this paper, we propose a novel neu-ral network model called RNN Encoder– Decoder that consists of two recurrent neural networks (RNN). One RNN en-codes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another se-quence of symbols. The encoder and de-coder of the proposed model are jointly trained to maximize the conditional prob-ability of a target sequence given a source sequence. The performance of a statisti-cal machine translation system is empiri-cally found to improve by using the con-ditional probabilities of phrase pairs com-puted by the RNN Encoder–Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases. 1
Unifying visual-semantic embeddings with multimodal neural language models
"... Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embed-ding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effecti ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
(Show Context)
Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embed-ding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We in-troduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the en-coder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Ox-ford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car *- "blue " + "red " is near images of red cars. Sample captions generated for 800 images are made available for comparison. 1
Translation Modeling with Bidirectional Recurrent Neural Networks
"... This work presents two different trans-lation models using recurrent neural net-works. The first one is a word-based ap-proach using word alignments. Second, we present phrase-based translation mod-els that are more consistent with phrase-based decoding. Moreover, we introduce bidirectional recurren ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
This work presents two different trans-lation models using recurrent neural net-works. The first one is a word-based ap-proach using word alignments. Second, we present phrase-based translation mod-els that are more consistent with phrase-based decoding. Moreover, we introduce bidirectional recurrent neural models to the problem of machine translation, allow-ing us to use the full source sentence in our models, which is also of theoretical inter-est. We demonstrate that our translation models are capable of improving strong baselines already including recurrent neu-ral language models on three tasks: IWSLT 2013 German→English, BOLT Arabic→English and Chinese→English. We obtain gains up to 1.6 % BLEU and 1.7 % TER by rescoring 1000-best lists. 1
A neural network approach to context-sensitive generation of conversational responses.
, 2015
"... Abstract We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allow ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
Abstract We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.
Modelling, visualising and summarising documents with a single convolutional neural network
, 2014
"... Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Process-ing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional ve ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Process-ing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional vector space, while preserving distinctions of word and sentence order crucial for capturing nu-anced semantics. Our model is based on an extended Dynamic Convolution Neu-ral Network, which learns convolution filters at both the sentence and document level, hierarchically learning to capture and compose low level lexical features into high level semantic concepts. We demonstrate the effectiveness of this model on a range of document modelling tasks, achieving strong results with no fea-ture engineering and with a more compact model. Inspired by recent advances in visualising deep convolution networks for computer vision, we present a novel vi-sualisation technique for our document networks which not only provides insight into their learning process, but also can be interpreted to produce a compelling automatic summarisation system for texts. 1
Combining word embeddings and feature embeddings for fine-grained relation extraction
- In NAACL
, 2015
"... Abstract Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved performance, it was restricted to a small number of features due to model co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved performance, it was restricted to a small number of features due to model complexity, thus limiting its applicability. We propose a new model that conjoins features and word embeddings while maintaing a small number of parameters by learning feature embeddings jointly with the parameters of a compositional model. The result is a method that can scale to more features and more labels, while avoiding overfitting. We demonstrate that our model attains state-of-the-art results on ACE and ERE fine-grained relation extraction.