Results 1 - 10
of
76
Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
, 2014
"... Neural machine translation is a recently proposed approach to machine transla-tion. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed r ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
(Show Context)
Neural machine translation is a recently proposed approach to machine transla-tion. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architec-ture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition. 1
Show and tell: A neural image caption generator
, 2014
"... Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep re-current architecture that combines recent advances in computer vision an ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep re-current architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.
Addressing the rare word problem in neural machine translation
- In ACL
, 2015
"... Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are com-parable to traditional approaches. A sig-nificant weakness in conventional NMT systems is their inability to correctly trans-late very rare words: end-to-end NMTs tend to have rela ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
(Show Context)
Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are com-parable to traditional approaches. A sig-nificant weakness in conventional NMT systems is their inability to correctly trans-late very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OOV) word. In this paper, we propose and im-plement an effective technique to address this problem. We train an NMT system on data that is augmented by the output of a word alignment algorithm, allowing the NMT system to emit, for each OOV word in the target sentence, the position of its corresponding word in the source sen-tence. This information is later utilized in a post-processing step that translates every OOV word using a dictionary. Our exper-iments on the WMT’14 English to French translation task show that this method pro-vides a substantial improvement of up to 2.8 BLEU points over an equivalent NMT system that does not use this technique. With 37.5 BLEU points, our NMT sys-tem is the first to surpass the best result achieved on a WMT’14 contest task. 1
A neural network approach to context-sensitive generation of conversational responses.
, 2015
"... Abstract We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allow ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
Abstract We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
"... Neural machine translation is a relatively new approach to statistical machine trans-lation based purely on neural networks. The neural machine translation models of-ten consist of an encoder and a decoder. The encoder extracts a fixed-length repre-sentation from a variable-length input sen-tence, a ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Neural machine translation is a relatively new approach to statistical machine trans-lation based purely on neural networks. The neural machine translation models of-ten consist of an encoder and a decoder. The encoder extracts a fixed-length repre-sentation from a variable-length input sen-tence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the proper-ties of the neural machine translation us-ing two models; RNN Encoder–Decoder and a newly proposed gated recursive con-volutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance de-grades rapidly as the length of the sentence and the number of unknown words in-crease. Furthermore, we find that the pro-posed gated recursive convolutional net-work learns a grammatical structure of a sentence automatically. 1
A neural conversational model.
- In Proc. of ICML Deep Learning Workshop.
, 2015
"... Abstract Conversational modeling is an important task in natural language understanding and machine intelligence. Although previous approaches exist, they are often restricted to specific domains (e.g., booking an airline ticket) and require handcrafted rules. In this paper, we present a simple app ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Abstract Conversational modeling is an important task in natural language understanding and machine intelligence. Although previous approaches exist, they are often restricted to specific domains (e.g., booking an airline ticket) and require handcrafted rules. In this paper, we present a simple approach for this task which uses the recently proposed sequence to sequence framework. Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation. The strength of our model is that it can be trained end-to-end and thus requires much fewer hand-crafted rules. We find that this straightforward model can generate simple conversations given a large conversational training dataset. Our preliminary suggest that, despite optimizing the wrong objective function, the model is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles. On a domain-specific IT helpdesk dataset, the model can find a solution to a technical problem via conversations. On a noisy open-domain movie transcript dataset, the model can perform simple forms of common sense reasoning. As expected, we also find that the lack of consistency is a common failure mode of our model.
Neural responding machine for short-text conversation.
- In Proceedings of ACL,
, 2015
"... Abstract We propose Neural Responding Machine (NRM), a neural network-based response generator for Short-Text Conversation. NRM takes the general encoderdecoder framework: it formalizes the generation of response as a decoding process based on the latent representation of the input text, while both ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Abstract We propose Neural Responding Machine (NRM), a neural network-based response generator for Short-Text Conversation. NRM takes the general encoderdecoder framework: it formalizes the generation of response as a decoding process based on the latent representation of the input text, while both encoding and decoding are realized with recurrent neural networks (RNN). The NRM is trained with a large amount of one-round conversation data collected from a microblogging service. Empirical study shows that NRM can generate grammatically correct and content-wise appropriate responses to over 75% of the input text, outperforming stateof-the-arts in the same setting, including retrieval-based and SMT-based models.
Character-Aware Neural Language Models Yoon Kim† †School of Engineering and Applied Sciences Harvard University
"... We describe a simple neural language model that re-lies only on character-level inputs. Predictions are still made at the word-level. Our model employs a con-volutional neural network (CNN) and a highway net-work over characters, whose output is given to a long short-term memory (LSTM) recurrent neu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We describe a simple neural language model that re-lies only on character-level inputs. Predictions are still made at the word-level. Our model employs a con-volutional neural network (CNN) and a highway net-work over characters, whose output is given to a long short-term memory (LSTM) recurrent neural net-work language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60 % fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model out-performs word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for lan-guage modeling. Analysis of word representations ob-tained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.
Compositional vector space models for knowledge base completion
- In ACL
, 2015
"... Traditional approaches for knowledge base completion are based on symbolic representations of knowledge. Low-dimensional vector embedding models proposed recently for this task are at-tractive since they generalize to possi-bly unlimited sets of relations. A sig-nificant drawback of previous embed-d ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Traditional approaches for knowledge base completion are based on symbolic representations of knowledge. Low-dimensional vector embedding models proposed recently for this task are at-tractive since they generalize to possi-bly unlimited sets of relations. A sig-nificant drawback of previous embed-ding models for KB completion is that they merely support reasoning on indi-vidual relations (e.g., bornIn(X,Y) ⇒ nationality(X,Y)). In this work, we de-velop an embedding model for KB com-pletion that supports chains of reasoning on paths of any length using compositional vector space models. Unlike most previ-ous methods, our approach can general-ize to paths that are unseen in training and additionally, in a zero-shot setting, predict target relations without explicitly training for the target relation types. In a challeng-ing large-scale dataset, our method out-performs a simple classifier method and a method that uses pre-trained vectors by 11 % and 7 % respectively, and performs competitively with a modified stronger baseline. We also show that the zero-shot model without using any direct supervi-sion achieves impressive results by per-forming significantly better than a random baseline. 1