Results 1 - 10
of
248
Representation learning: A review and new perspectives.
- of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract
-
Cited by 173 (4 self)
- Add to MetaCart
Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Distributed Representations of Sentences and Documents
- In NAACL HLT
"... Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the order-ing of the words ..."
Abstract
-
Cited by 93 (1 self)
- Add to MetaCart
Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the order-ing of the words and they also ignore semantics of the words. For example, “powerful, ” “strong” and “Paris ” are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algo-rithm that learns fixed-length feature representa-tions from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algo-rithm represents each document by a dense vec-tor which is trained to predict words in the doc-ument. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Para-graph Vectors outperform bag-of-words models as well as other techniques for text representa-tions. Finally, we achieve new state-of-the-art re-sults on several text classification and sentiment analysis tasks. 1.
Learning and transferring mid-level image representations using convolutional neural networks
- In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2014
"... Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The suc-cess of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level f ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
(Show Context)
Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The suc-cess of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification meth-ods. Learning CNNs, however, amounts to estimating mil-lions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be effi-ciently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred rep-resentation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization. 1.
Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors.
- In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
, 2014
"... Abstract Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-ve ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
(Show Context)
Abstract Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a wide range of lexical semantics tasks and across many parameter settings. The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts.
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
"... Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. In this study we strive to develop a series of new latent semantic models with a deep structure that project queries and documents into a common low-di ..."
Abstract
-
Cited by 38 (15 self)
- Add to MetaCart
Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. In this study we strive to develop a series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them. The proposed deep structured semantic models are discriminatively trained by maximizing the conditional likelihood of the clicked documents given a query using the clickthrough data. To make our models applicable to large-scale Web search applications, we also use a technique called word hashing, which is shown to effectively scale up our semantic models to handle large vocabularies which are common in such tasks. The new models are evaluated on a Web document ranking task using a real-world data set. Results show that our best model significantly outperforms other latent semantic models, which were considered state-of-the-art in the performance prior to the work presented in this paper.
Better Word Representations with Recursive Neural Networks for Morphology
"... Vector-space word representations have been very successful in recent years at improving performance across a variety of NLP tasks. However, common to most existing work, words are regarded as independent entities without any explicit relationship among morphologically related words being modeled. A ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
(Show Context)
Vector-space word representations have been very successful in recent years at improving performance across a variety of NLP tasks. However, common to most existing work, words are regarded as independent entities without any explicit relationship among morphologically related words being modeled. As a result, rare and complex words are often poorly estimated, and all unknown words are represented in a rather crude way using only one or a few vectors. This paper addresses this shortcoming by proposing a novel model that is capable of building representations for morphologically complex words from their morphemes. We combine recursive neural networks (RNNs), where each morpheme is a basic unit, with neural language models (NLMs) to consider contextual information in learning morphologicallyaware word representations. Our learned models outperform existing word representations by a good margin on word similarity tasks across many datasets, including a new dataset we introduce focused on rare words to complement existing ones in an interesting way. 1
A latent factor model for highly multi-relational data
"... Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large body of work focused on modeling these data, modeling these multiple types of relations jointly remains challenging. Further ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
(Show Context)
Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large body of work focused on modeling these data, modeling these multiple types of relations jointly remains challenging. Further, existing approaches tend to breakdown when the number of these types grows. In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures various orders of interaction of the data, and also shares sparse latent factors across different relations. We illustrate the performance of our approach on standard tensor-factorization datasets where we attain, or outperform, state-of-the-art results. Finally, a NLP application demonstrates our scalability and the ability of our model to learn efficient and semantically meaningful verb representations. 1
Practical recommendations for gradient-based training of deep architectures
- Neural Networks: Tricks of the Trade
, 2013
"... ar ..."
(Show Context)
Deep Learning for Efficient Discriminative Parsing
"... We propose a new fast purely discriminative algorithm for natural language parsing, based on a “deep ” recurrent convolutional graph transformer network (GTN). Assuming a decomposition of a parse tree into a stack of “levels”, the network predicts a level of the tree taking into account predictions ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
(Show Context)
We propose a new fast purely discriminative algorithm for natural language parsing, based on a “deep ” recurrent convolutional graph transformer network (GTN). Assuming a decomposition of a parse tree into a stack of “levels”, the network predicts a level of the tree taking into account predictions of previous levels. Using only few basic text features, we show similar performance (in F1 score) to existing pure discriminative parsers and existing “benchmark ” parsers (like Collins parser, probabilistic context-free grammars based), with a huge speed advantage. 1
Learning sentiment-specific word embedding for twitter sentiment classification.
- In ACL,
, 2014
"... Abstract We present a method that learns word embedding for Twitter sentiment classification in this paper. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text. This is problematic for sentiment a ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
(Show Context)
Abstract We present a method that learns word embedding for Twitter sentiment classification in this paper. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text. This is problematic for sentiment analysis as they usually map words with similar syntactic context but opposite sentiment polarity, such as good and bad, to neighboring word vectors. We address this issue by learning sentimentspecific word embedding (SSWE), which encodes sentiment information in the continuous representation of words. Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions. To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons. Experiments on applying SS-WE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with hand-crafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set.