Results 1 - 10
of
20
Modeling Interestingness with Deep Neural Networks
"... This paper presents a deep semantic simi-larity model (DSSM), a special type of deep neural networks designed for text analysis, for recommending target docu-ments to be of interest to a user based on a source document that she is reading. We observe, identify, and detect naturally oc-curring signal ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
This paper presents a deep semantic simi-larity model (DSSM), a special type of deep neural networks designed for text analysis, for recommending target docu-ments to be of interest to a user based on a source document that she is reading. We observe, identify, and detect naturally oc-curring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a way that the distance between source doc-uments and their corresponding interesting targets in that space is minimized. The ef-fectiveness of the DSSM is demonstrated using two interestingness tasks: automatic highlighting and contextual entity search. The results on large-scale, real-world da-tasets show that the semantics of docu-ments are important for modeling interest-ingness and that the DSSM leads to signif-icant quality improvement on both tasks, outperforming not only the classic docu-ment models that do not use semantics but also state-of-the-art topic models. 1
Improving statistical machine translation with monolingual collocation
- In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
, 2010
"... We investigate how to improve bilingual embedding which has been successfully used as a feature in phrase-based sta-tistical machine translation (SMT). De-spite bilingual embedding’s success, the contextual information, which is of criti-cal importance to translation quality, was ignored in previous ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
We investigate how to improve bilingual embedding which has been successfully used as a feature in phrase-based sta-tistical machine translation (SMT). De-spite bilingual embedding’s success, the contextual information, which is of criti-cal importance to translation quality, was ignored in previous work. To employ the contextual information, we propose a simple and memory-efficient model for learning bilingual embedding, taking both the source phrase and context around the phrase into account. Bilingual translation scores generated from our proposed bilin-gual embedding model are used as features in our SMT system. Experimental results show that the proposed method achieves significant improvements on large-scale Chinese-English translation task. 1
A neural network approach to context-sensitive generation of conversational responses.
, 2015
"... Abstract We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allow ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
Abstract We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.
A Diversity-Promoting Objective Function for Neural Conversation Models
"... Abstract Sequence-to-sequence neural network models for generation of conversational responses tend to generate safe, commonplace responses (e.g., I don't know) regardless of the input. We suggest that the traditional objective function, i.e., the likelihood of output (response) given input (m ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract Sequence-to-sequence neural network models for generation of conversational responses tend to generate safe, commonplace responses (e.g., I don't know) regardless of the input. We suggest that the traditional objective function, i.e., the likelihood of output (response) given input (message) is unsuited to response generation tasks. Instead we propose using Maximum Mutual Information (MMI) as the objective function in neural models. Experimental results demonstrate that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations.
Large-scale expected BLEU training of phrase-based reordering models
- In Conference on Empirical Methods in Natural Language Processing (EMNLP
, 2014
"... Abstract Recent work by ..."
(Show Context)
A Persona-Based Neural Conversation Model
"... Abstract We present persona-based models for handling the issue of speaker consistency in neural response generation. A speaker model encodes personas in distributed embeddings that capture individual characteristics such as background information and speaking style. A dyadic speakeraddressee model ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract We present persona-based models for handling the issue of speaker consistency in neural response generation. A speaker model encodes personas in distributed embeddings that capture individual characteristics such as background information and speaking style. A dyadic speakeraddressee model captures properties of interactions between two interlocutors. Our models yield qualitative performance improvements in both perplexity and BLEU scores over baseline sequence-to-sequence models, with similar gains in speaker consistency as measured by human judges.
Bilingual continuous-space language model growing for statistical machine translation
- Audio, Speech, and Language Processing, IEEE/ACM Transactions on
, 2015
"... Abstract—Larger-gram language models (LMs) perform better in statistical machine translation (SMT). However, the existing approaches have two main drawbacks for constructing larger LMs: 1) it is not convenient to obtain larger corpora in the same domain as the bilingual parallel corpora in SMT; 2) m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Larger-gram language models (LMs) perform better in statistical machine translation (SMT). However, the existing approaches have two main drawbacks for constructing larger LMs: 1) it is not convenient to obtain larger corpora in the same domain as the bilingual parallel corpora in SMT; 2) most of the previous studies focus on monolingual information from the target corpora only, and redundant-grams have not been fully utilized in SMT. Nowadays, continuous-space language model (CSLM), especially neural network language model (NNLM), has been shown great improvement in the estimation accuracies of the probabilities for predicting the target words. However, most of these CSLM and NNLM approaches still consider monolingual information only or require additional corpus. In this paper, we propose a novel neural network based bilingual LM growing method. Compared to the existing approaches, the proposed method enables us to use bilingual parallel corpus for LM growing in SMT. The results show that our new method outperforms the existing approaches on both SMT performance and computational efficiency significantly. Index Terms—Continuous-space language model, language model growing (LMG), neural network language model, statistical
How to Avoid Unwanted Pregnancies: Domain Adaptation using Neural Network Models
"... We present novel models for domain adap-tation based on the neural network joint model (NNJM). Our models maximize the cross entropy by regularizing the loss function with respect to in-domain model. Domain adaptation is carried out by as-signing higher weight to out-domain se-quences that are simil ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We present novel models for domain adap-tation based on the neural network joint model (NNJM). Our models maximize the cross entropy by regularizing the loss function with respect to in-domain model. Domain adaptation is carried out by as-signing higher weight to out-domain se-quences that are similar to the in-domain data. In our alternative model we take a more restrictive approach by additionally penalizing sequences similar to the out-domain data. Our models achieve better perplexities than the baseline NNJM mod-els and give improvements of up to 0.5 and 0.6 BLEU points in Arabic-to-English and English-to-German language pairs, on a standard task of translating TED talks. 1
Word Translation Prediction for Morphologically Rich Languages with Bilingual Neural Networks
"... Translating into morphologically rich lan-guages is a particularly difficult problem in machine translation due to the high de-gree of inflectional ambiguity in the tar-get language, often only poorly captured by existing word translation models. We present a general approach that exploits source-si ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Translating into morphologically rich lan-guages is a particularly difficult problem in machine translation due to the high de-gree of inflectional ambiguity in the tar-get language, often only poorly captured by existing word translation models. We present a general approach that exploits source-side contexts of foreign words to improve translation prediction accuracy. Our approach is based on a probabilistic neural network which does not require lin-guistic annotation nor manual feature en-gineering. We report significant improve-ments in word translation prediction accu-racy for three morphologically rich target languages. In addition, preliminary results for integrating our approach into a large-scale English-Russian statistical machine translation system show small but statisti-cally significant improvements in transla-tion quality. 1
Context-Dependent Translation Selection Using Convolutional Neural Network
"... Abstract We propose a novel method for translation selection in statistical machine translation, in which a convolutional neural network is employed to judge the similarity between a phrase pair in two languages. The specifically designed convolutional architecture encodes not only the semantic sim ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We propose a novel method for translation selection in statistical machine translation, in which a convolutional neural network is employed to judge the similarity between a phrase pair in two languages. The specifically designed convolutional architecture encodes not only the semantic similarity of the translation pair, but also the context containing the phrase in the source language. Therefore, our approach is able to capture context-dependent semantic similarities of translation pairs. We adopt a curriculum learning strategy to train the model: we classify the training examples into easy, medium, and difficult categories, and gradually build the ability of representing phrases and sentencelevel contexts by using training examples from easy to difficult. Experimental results show that our approach significantly outperforms the baseline system by up to 1.4 BLEU points.