• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Efficient estimation of word representations in vector space (2013)

by T Mikolov, K Chen, G Corrado, J Dean
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 372
Next 10 →

Distributed representations of words and phrases and their compositionality

by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean - IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS , 2013
"... ..."
Abstract - Cited by 371 (3 self) - Add to MetaCart
Abstract not found

Distributed Representations of Sentences and Documents

by Quoc Le, Tomas Mikolov - In NAACL HLT
"... Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the order-ing of the words ..."
Abstract - Cited by 93 (1 self) - Add to MetaCart
Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the order-ing of the words and they also ignore semantics of the words. For example, “powerful, ” “strong” and “Paris ” are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algo-rithm that learns fixed-length feature representa-tions from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algo-rithm represents each document by a dense vec-tor which is trained to predict words in the doc-ument. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Para-graph Vectors outperform bag-of-words models as well as other techniques for text representa-tions. Finally, we achieve new state-of-the-art re-sults on several text classification and sentiment analysis tasks. 1.

Devise: A deep visual-semantic embedding model,” in NIPS,

by Andrea Frome , Greg S Corrado , Jonathon Shlens , Samy Bengio , Jeffrey Dean , Marc ' , Aurelio Ranzato , Tomas Mikolov , 2013
"... Abstract Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One ..."
Abstract - Cited by 67 (3 self) - Add to MetaCart
Abstract Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources -such as text data -both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions achieving hit rates of up to 18% across thousands of novel labels never seen by the visual model.
(Show Context)

Citation Context

...f these rely on a curated source of semantic information for the labels: the WordNet hierarchy is used in [12] and [17], and [16] uses a knowledge base containing descriptive properties for each class. By contrast, our approach learns its semantic representation directly from unannotated data. 3 Proposed Approach Our objective is to leverage semantic knowledge learned in the text domain, and transfer it to a model trained for visual object recognition. We begin by pre-training a simple neural language model wellsuited for learning semantically-meaningful, dense vector representations of words [13]. In parallel, we pre-train a state-of-the-art deep neural network for visual object recognition [11], complete with a traditional softmax output layer. We then construct a deep visual-semantic model by taking the lower layers of the pre-trained visual object recognition network and re-training them to predict the vector representation of the image label text as learned by the language model. These three training phases are detailed below. 3.1 Language Model Pre-training The skip-gram text modeling architecture introduced by Mikolov et al [13, 14] has been shown to efficiently learn semantical...

Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion

by Xin Luna Dong, Kevin Murphy, Thomas Strohmann, Shaohua Sun, Wei Zhang - In submission , 2014
"... Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Mi-crosoft’s Satori, and Google’s Knowledge Graph. To in-crease the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous ap-proaches have pr ..."
Abstract - Cited by 49 (6 self) - Add to MetaCart
Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Mi-crosoft’s Satori, and Google’s Knowledge Graph. To in-crease the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous ap-proaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that com-bines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repos-itories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilis-tic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods. Keywords Knowledge bases; information extraction; probabilistic mod-els; machine learning 1.
(Show Context)

Citation Context

...learns a meaningful “semantic” representation of the entities and predicates, we can compute the nearest neighbors of various items in the a K-dimensional space. It is known from previous work (e.g., =-=[27]-=-) that related entities cluster together in the space, so here we focus on predicates. The results are shown in Table 4. We see that the model learns to put semantically related (but not necessarily s...

Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors.

by Marco Baroni , Georgiana Dinu , Germán Kruszewski - In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), , 2014
"... Abstract Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-ve ..."
Abstract - Cited by 42 (1 self) - Add to MetaCart
Abstract Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a wide range of lexical semantics tasks and across many parameter settings. The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts.
(Show Context)

Citation Context

...f preserved variance, etc.). Occasionally, some kind of indirect supervision is used: Several parameter settings are tried, and the best setting is chosen based on performance on a semantic task that has been selected for tuning. The last few years have seen the development of a new generation of DSMs that frame the vector estimation problem directly as a supervised task, where the weights in a word vector are set to maximize the probability of the contexts in which the word is observed in the corpus (Bengio et al., 2003; Collobert and Weston, 2008; Collobert et al., 2011; Huang et al., 2012; Mikolov et al., 2013a; Turian et al., 2010). The traditional construction of context vectors is turned on its head: Instead of first collecting context vectors and then reweighting these vectors based on various criteria, the vector weights are directly set to optimally predict the contexts in which the corresponding words tend to appear. Since similar words occur in similar contexts, the system naturally learns to assign similar vectors to similar words. This new way to train DSMs is attractive because it replaces the essentially heuristic stacking of vector transforms in earlier models with a single, well-defin...

Show and tell: A neural image caption generator

by Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2014
"... Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep re-current architecture that combines recent advances in computer vision an ..."
Abstract - Cited by 32 (2 self) - Add to MetaCart
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep re-current architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.

Multimodal Neural Language Models

by Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel
"... We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. An image-text multimodal neural language model can be used to retrieve images given complex sentence queries, retrieve phrase descriptions given image queries, as well as gener ..."
Abstract - Cited by 29 (4 self) - Add to MetaCart
We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. An image-text multimodal neural language model can be used to retrieve images given complex sentence queries, retrieve phrase descriptions given image queries, as well as generate text conditioned on images. We show that in the case of image-text modelling we can jointly learn word representa-tions and image features by training our models together with a convolutional network. Unlike many of the existing methods, our approach can generate sentence descriptions for images with-out the use of templates, structured prediction, and/or syntactic trees. While we focus on image-text modelling, our algorithms can be easily ap-plied to other modalities such as audio. 1.
(Show Context)

Citation Context

...ese vectors is high for semantically similar words. Several models have been proposed based on feedforward networks (Bengio et al., 2003), log-bilinear models (Mnih & Hinton, 2007), skip-gram models (=-=Mikolov et al., 2013-=-) and recurrent neural networks (Mikolov et al., 2010; 2011). Training can be sped up through the use of hierarchical softmax (Morin & Bengio, 2005) or noise contrastive estimation (Mnih & Teh, 2012)....

Unifying visual-semantic embeddings with multimodal neural language models

by Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel
"... Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embed-ding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effecti ..."
Abstract - Cited by 26 (4 self) - Add to MetaCart
Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embed-ding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We in-troduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the en-coder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Ox-ford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car *- "blue " + "red " is near images of red cars. Sample captions generated for 800 images are made available for comparison. 1
(Show Context)

Citation Context

...mage embedding x, and vice-versa with xk. For all of our experiments, we initialize the word embeddings WT to be pre-computed K = 300 dimensional vectors learned using a continuous bag-of-words model =-=[37]-=-. The contrastive terms are chosen randomly from the training set and resampled every epoch. 1For additional details on LSTM: http://people.idsia.ch/~juergen/rnn.html. 2As a slight abuse of notation, ...

Intriguing properties of neural networks

by Christian Szegedy, Google Inc, Wojciech Zaremba, Ilya Sutskever, Google Inc, Joan Bruna, Dumitru Erhan, Google Inc, Ian Goodfellow, Rob Fergus , 2013
"... Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninter-pretable solutions that could have counter-intuitive properti ..."
Abstract - Cited by 21 (1 self) - Add to MetaCart
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninter-pretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclas-sify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input. 1
(Show Context)

Citation Context

...s the entire space of activations, rather than the individual units, that contains the bulk of the semantic information. A similar, but even stronger conclusion was reached recently by Mikolov et al. =-=[12]-=- for word representations, where the various directions in the vector space representing the words are shown to give rise to a surprisingly rich semantic encoding of relations and analogies. At the sa...

Exploiting similarities among languages for machine translation

by Tomas Mikolov, Google Inc, Mountain View, Quoc V. Le, Google Inc, Ilya Sutskever, Google Inc , 2013
"... Dictionaries and phrase tables are the basis of modern statistical machine translation sys-tems. This paper develops a method that can automate the process of generating and ex-tending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language struc ..."
Abstract - Cited by 21 (1 self) - Add to MetaCart
Dictionaries and phrase tables are the basis of modern statistical machine translation sys-tems. This paper develops a method that can automate the process of generating and ex-tending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping be-tween languages from small bilingual data. It uses distributed representation of words and learns a linear mapping between vector spaces of languages. Despite its simplicity, our method is surprisingly effective: we can achieve almost 90 %
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University