Results 1 -
9 of
9
Reserating the awesometastic: An automatic extension of the WordNet taxonomy for novel terms
"... This paper presents CROWN, an automatically con-structed extension of WordNet that augments its taxonomy with novel lemmas from Wiktionary. CROWN fills the important gap in WordNet’s lexi-con for slang, technical, and rare lemmas, and more than doubles its current size. In two evaluations, we demons ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This paper presents CROWN, an automatically con-structed extension of WordNet that augments its taxonomy with novel lemmas from Wiktionary. CROWN fills the important gap in WordNet’s lexi-con for slang, technical, and rare lemmas, and more than doubles its current size. In two evaluations, we demonstrate that the construction procedure is accu-rate and has a significant impact on a WordNet-based algorithm encountering novel lemmas. 1
Word Similarity Perception: an Explorative Analysis
"... Abstract Natural language is a medium for expressing things belonging to conceptual and cognitive levels, made of words and grammar rules used to carry semantics. However, its natural ambiguity is the main critical issue that computational systems are generally asked to solve. In this paper, we pro ..."
Abstract
- Add to MetaCart
Abstract Natural language is a medium for expressing things belonging to conceptual and cognitive levels, made of words and grammar rules used to carry semantics. However, its natural ambiguity is the main critical issue that computational systems are generally asked to solve. In this paper, we propose to go beyond the current conceptualization of word similarity, i.e., the building block of disambiguation at computational level. First, we analyze the origin of the perceived similarity, studying how conceptual, functional, and syntactic aspects influence its strength. We report the results of a two-stages experiment showing clear similarity perception patterns. Then, based on the insights gained in the cognitive tests, we developed a computational system that automatically predicts word similarity reaching high levels of accuracy.
Specializing Word Embeddings for Similarity or Relatedness
"... Abstract We demonstrate the advantage of specializing semantic word embeddings for either similarity or relatedness. We compare two variants of retrofitting and a joint-learning approach, and find that all three yield specialized semantic spaces that capture human intuitions regarding similarity an ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We demonstrate the advantage of specializing semantic word embeddings for either similarity or relatedness. We compare two variants of retrofitting and a joint-learning approach, and find that all three yield specialized semantic spaces that capture human intuitions regarding similarity and relatedness better than unspecialized spaces. We also show that using specialized spaces in NLP tasks and applications leads to clear improvements, for document classification and synonym selection, which rely on either similarity or relatedness but not both.
Published as a conference paper at ICLR 2015 WORD REPRESENTATIONS VIA GAUSSIAN EMBEDDING
"... Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Mapping instead to a density provides many interesting advantages, including better capturing uncertainty about a representa-tion and its relationships, expressing asymmetries more naturall ..."
Abstract
- Add to MetaCart
(Show Context)
Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Mapping instead to a density provides many interesting advantages, including better capturing uncertainty about a representa-tion and its relationships, expressing asymmetries more naturally than dot product or cosine similarity, and enabling more expressive parameterization of decision boundaries. This paper advocates for density-based distributed embeddings and presents a method for learning representations in the space of Gaussian distribu-tions. We compare performance on various word embedding benchmarks, inves-tigate the ability of these embeddings to model entailment and other asymmetric relationships, and explore novel properties of the representation. 1
Evaluating Learning Language Representations
"... Abstract. Machine learning offers significant benefits for systems that process and understand natural language: a) lower maintenance and up-keep costs than when using manually-constructed resources, b) easier portability to new domains, tasks, or languages, and c) robust and timely adaptation to si ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Machine learning offers significant benefits for systems that process and understand natural language: a) lower maintenance and up-keep costs than when using manually-constructed resources, b) easier portability to new domains, tasks, or languages, and c) robust and timely adaptation to situation-specific settings. However, the behaviour of an adaptive system is less predictable than when using an edited, stable re-source, which makes quality control a continuous issue. This paper pro-poses an evaluation benchmark for measuring the quality, coverage, and stability of a natural language system as it learns word meaning. Inspired by existing tests for human vocabulary learning, we outline measures for the quality of semantic word representations, such as when learning word embeddings or other distributed representations. These measures high-light differences between the types of underlying learning processes as systems ingest progressively more data.
centre.org
"... Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please ..."
Abstract
- Add to MetaCart
(Show Context)
Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please
Improving Word Representations via Global Visual Context
"... Visually grounded semantics is a very important aspect in word representation, largely due to its potential to improve many NLP tasks such as information re-trieval, text classification and analysis. We present a new distributed word learn-ing framework which 1) learns word embeddings that better ca ..."
Abstract
- Add to MetaCart
(Show Context)
Visually grounded semantics is a very important aspect in word representation, largely due to its potential to improve many NLP tasks such as information re-trieval, text classification and analysis. We present a new distributed word learn-ing framework which 1) learns word embeddings that better capture the visually grounded semantics by unifying local document context and global visual context, 2) jointly learns word representation, image representation and language models and 3) focus on better word similarity rather than relatedness. We apply a data set that contains 1 million image-sentence pairs for training and the evaluation on word similarity demonstrates our model outperforms linguistic model without global visual context. 1
ORIGINAL PAPER Cross level semantic similarity: an evaluation framework for universal measures of similarity
"... Abstract Semantic similarity has typically been measured across items of approximately similar sizes. As a result, similarity measures have largely ignored the fact that different types of linguistic item can potentially have similar or even identical meanings, and therefore are designed to compare ..."
Abstract
- Add to MetaCart
Abstract Semantic similarity has typically been measured across items of approximately similar sizes. As a result, similarity measures have largely ignored the fact that different types of linguistic item can potentially have similar or even identical meanings, and therefore are designed to compare only one type of lin-guistic item. Furthermore, nearly all current similarity benchmarks within NLP contain pairs of approximately the same size, such as word or sentence pairs, preventing the evaluation of methods that are capable of comparing different sized items. To address this, we introduce a new semantic evaluation called cross-level semantic similarity (CLSS), which measures the degree to which the meaning of a larger linguistic item, such as a paragraph, is captured by a smaller item, such as a sentence. Our pilot CLSS task was presented as part of SemEval-2014, which attracted 19 teams who submitted 38 systems. CLSS data contains a rich mixture of pairs, spanning from paragraphs to word senses to fully evaluate similarity measures that are capable of comparing items of any type. Furthermore, data sources were drawn from diverse corpora beyond just newswire, including domain-specific texts and social media. We describe the annotation process and its challenges, including a comparison with crowdsourcing, and identify the factors that make the dataset a rigorous assessment of a method’s quality. Furthermore, we examine in detail the systems participating in the SemEval task to identify the common factors associated
Not All Neural Embeddings are Born Equal
"... Neural language models learn word representations that capture rich linguistic and conceptual information. Here we investigate the embeddings learned by neural machine translation models. We show that translation-based embeddings outper-form those learned by cutting-edge monolingual models at single ..."
Abstract
- Add to MetaCart
(Show Context)
Neural language models learn word representations that capture rich linguistic and conceptual information. Here we investigate the embeddings learned by neural machine translation models. We show that translation-based embeddings outper-form those learned by cutting-edge monolingual models at single-language tasks requiring knowledge of conceptual similarity and/or syntactic role. The findings suggest that, while monolingual models learn information about how concepts are related, neural-translation models better capture their true ontological status. It is well known that word representations can be learned from the distributional patterns in corpora. Originally, such representations were constructed by counting word co-occurrences, so that the fea-tures in one word’s representation corresponded to other words [11, 17]. Neural language models, an alternative means to learn word representations, use language data to optimise (latent) features with respect to a language modelling objective. The objective can be to predict either the next word given the initial words of a sentence [4, 14, 8], or simply a nearby word given a single cue word [13, 15]. The representations learned by neural models (sometimes called embeddings) generally outperform those acquired by co-occurrence counting models when applied to NLP tasks [3]. Despite these clear results, it is not well understood how the architecture of neural models affects the information encoded in their embeddings. Here, we explore this question by considering the em-beddings learned by architectures with a very different objective function to monolingual language models: neural machine translation models. We show that translation-based embeddings outperform monolingual embeddings on two types of task: those that require knowledge of conceptual similarity (rather than simply association or relatedness), and those that require knowledge of syntactic role. We discuss what the findings indicate about the information content of different embeddings, and suggest how this content might emerge as a consequence of the translation objective.