Results 1 - 10
of
47
Improved semantic representations from tree-structured long short-term memory networks
- IN PROC. ACL
, 2015
"... Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural net-work with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure t ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural net-work with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally com-bine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two
Structured training for neural network transition-based parsing
- In Proceedings of ACLIJCNLP
, 2015
"... We present structured perceptron training for neural network transition-based dependency parsing. We learn the neural network representation using a gold corpus augmented by a large number of automat-ically parsed sentences. Given this fixed network representation, we learn a final layer using the s ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
We present structured perceptron training for neural network transition-based dependency parsing. We learn the neural network representation using a gold corpus augmented by a large number of automat-ically parsed sentences. Given this fixed network representation, we learn a final layer using the struc-tured perceptron with beam-search decoding. On the Penn Treebank, our parser reaches 94.26 % un-labeled and 92.41 % labeled attachment accuracy, which to our knowledge is the best accuracy on Stanford Dependencies to date. We also provide in-depth ablative analysis to determine which aspects of our model provide the largest gains in accuracy. 1
Model-based word embeddings from decompositions of count matrices
, 2015
"... This work develops a new statistical un-derstanding of word embeddings induced from transformed count data. Using the class of hidden Markov models (HMMs) underlying Brown clustering as a genera-tive model, we demonstrate how canoni-cal correlation analysis (CCA) and certain count transformations pe ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
This work develops a new statistical un-derstanding of word embeddings induced from transformed count data. Using the class of hidden Markov models (HMMs) underlying Brown clustering as a genera-tive model, we demonstrate how canoni-cal correlation analysis (CCA) and certain count transformations permit efficient and effective recovery of model parameters with lexical semantics. We further show in experiments that these techniques empir-ically outperform existing spectral meth-ods on word similarity and analogy tasks, and are also competitive with other pop-ular methods such as WORD2VEC and GLOVE. 1
Don’t just listen, use your imagination: Leveraging visual common sense for non-visual tasks. arXiv preprint arXiv:1502.06108
, 2015
"... Artificial agents today can answer factual questions. But they fall short on questions that require common sense rea-soning. Perhaps this is because most existing common sense databases rely on text to learn and represent knowledge. But much of common sense knowledge is unwritten – partly because it ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Artificial agents today can answer factual questions. But they fall short on questions that require common sense rea-soning. Perhaps this is because most existing common sense databases rely on text to learn and represent knowledge. But much of common sense knowledge is unwritten – partly because it tends not to be interesting enough to talk about, and partly because some common sense is unnatural to ar-ticulate in text. While unwritten, it is not unseen. In this pa-per we leverage semantic common sense knowledge learned from images – i.e. visual common sense – in two textual tasks: fill-in-the-blank and visual paraphrasing. We pro-pose to “imagine ” the scene behind the text, and leverage visual cues from the “imagined ” scenes in addition to tex-tual cues while answering these questions. We imagine the scenes as a visual abstraction. Our approach outperforms a strong text-only baseline on these tasks. Our proposed tasks can serve as benchmarks to quantitatively evaluate progress in solving tasks that go “beyond recognition”. Our code and datasets are publicly available. 1.
Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval
"... Methods of deep neural networks (DNNs) have recently demonstrated superior performance on a number of natural language processing tasks. However, in most previous work, the models are learned based on either unsupervised objectives, which does not directly optimize the desired task, or single-task s ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Methods of deep neural networks (DNNs) have recently demonstrated superior performance on a number of natural language processing tasks. However, in most previous work, the models are learned based on either unsupervised objectives, which does not directly optimize the desired task, or single-task supervised objectives, which often suf-fer from insufficient training data. We develop a multi-task DNN for learning represen-tations across multiple tasks, not only leverag-ing large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains. Our multi-task DNN approach combines tasks of multiple-domain classification (for query classification) and information retrieval (ranking for web search), and demonstrates significant gains over strong baselines in a comprehensive set of domain adaptation.
Combining word embeddings and feature embeddings for fine-grained relation extraction
- In NAACL
, 2015
"... Abstract Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved performance, it was restricted to a small number of features due to model co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved performance, it was restricted to a small number of features due to model complexity, thus limiting its applicability. We propose a new model that conjoins features and word embeddings while maintaing a small number of parameters by learning feature embeddings jointly with the parameters of a compositional model. The result is a method that can scale to more features and more labels, while avoiding overfitting. We demonstrate that our model attains state-of-the-art results on ACE and ERE fine-grained relation extraction.
One Vector is Not Enough: Entity-Augmented Distributional Semantics for Discourse Relations
"... Discourse relations bind smaller linguis-tic units into coherent texts. However, automatically identifying discourse rela-tions is difficult, because it requires un-derstanding the semantics of the linked ar-guments. A more subtle challenge is that it is not enough to represent the meaning of each a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Discourse relations bind smaller linguis-tic units into coherent texts. However, automatically identifying discourse rela-tions is difficult, because it requires un-derstanding the semantics of the linked ar-guments. A more subtle challenge is that it is not enough to represent the meaning of each argument of a discourse relation, because the relation may depend on links between lower-level components, such as entity mentions. Our solution computes distributional meaning representations by composition up the syntactic parse tree. A key difference from previous work on compositional distributional semantics is that we also compute representations for entity mentions, using a novel downward compositional pass. Discourse relations are predicted from the distributional rep-resentations of the arguments, and also of their coreferent entity mentions. The resulting system obtains substantial im-provements over the previous state-of-the-art in predicting implicit discourse rela-tions in the Penn Discourse Treebank. 1
Edinburgh’s Syntax-Based Systems at WMT 2014
- In Proceedings of the Ninth Workshop on Statistical Machine Translation
, 2014
"... Abstract This paper describes the syntax-based systems built at the University of Edinburgh for the WMT 2015 shared translation task. We developed systems for all language pairs except French-English. This year we focused on: translation out of English using tree-to-string models; continuing to imp ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract This paper describes the syntax-based systems built at the University of Edinburgh for the WMT 2015 shared translation task. We developed systems for all language pairs except French-English. This year we focused on: translation out of English using tree-to-string models; continuing to improve our English-German system; and source-side morphological segmentation of Finnish using Morfessor.
Polynomial networks and factorization machines: New insights and efficient training algorithms.
- In Proceedings of International Conference on Machine Learning (ICML),
, 2016
"... Abstract Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks. In this paper, we revisit both models from a unified perspective. Based on this new view, we study the properties of both mo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks. In this paper, we revisit both models from a unified perspective. Based on this new view, we study the properties of both models and propose new efficient training algorithms. Key to our approach is to cast parameter learning as a low-rank symmetric tensor estimation problem, which we solve by multi-convex optimization. We demonstrate our approach on regression and recommender system tasks.
-regularized Neural Networks are Improperly Learnable in Polynomial Time
"... Abstract We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the 1 -norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1 − δ, it learn ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the 1 -norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1 − δ, it learns a predictor whose generalization error is at most worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.