Results 1 -
2 of
2
Word Embeddings Go to Italy: a Comparison of Models and Training Datasets
"... Abstract. In this paper we present some preliminary results on the generation of word embeddings for the Italian language. We compare two popular word representation models, word2vec and GloVe, and train them on two datasets with different stylistic properties. We test the generated word embeddings ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. In this paper we present some preliminary results on the generation of word embeddings for the Italian language. We compare two popular word representation models, word2vec and GloVe, and train them on two datasets with different stylistic properties. We test the generated word embeddings on a word analogy test derived from the one originally proposed for word2vec, adapted to capture some of the linguistic aspects that are specific of Italian. Results show that the tested models are able to create syntactically and semantically meaningful word embeddings despite the higher morphological complexity of Italian with respect to English. Moreover, we have found that the stylistic properties of the training dataset plays a relevant role in the type of information captured by the produced vectors.
Published as a conference paper at ICLR 2015 WORD REPRESENTATIONS VIA GAUSSIAN EMBEDDING
"... Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Mapping instead to a density provides many interesting advantages, including better capturing uncertainty about a representa-tion and its relationships, expressing asymmetries more naturall ..."
Abstract
- Add to MetaCart
Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Mapping instead to a density provides many interesting advantages, including better capturing uncertainty about a representa-tion and its relationships, expressing asymmetries more naturally than dot product or cosine similarity, and enabling more expressive parameterization of decision boundaries. This paper advocates for density-based distributed embeddings and presents a method for learning representations in the space of Gaussian distribu-tions. We compare performance on various word embedding benchmarks, inves-tigate the ability of these embeddings to model entailment and other asymmetric relationships, and explore novel properties of the representation. 1