Results 1 - 10
of
18
A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches
"... This paper presents and compares WordNetbased and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
This paper presents and compares WordNetbased and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses. 1
Wikipedia-based semantic interpretation for natural language processing
- J. Artif. Int. Res
"... Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such a ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users. 1.
Learning Graph Walk Based Similarity Measures for Parsed Text
"... We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-speci ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-specific word similarity measure in this graph. We also propose a new path-constrained graph walk method, in which the graph walk process is guided by high-level knowledge about meaningful edge sequences (paths). Empirical evaluation on the task of named entity coordinate term extraction shows that this framework is preferable to vector-based models for smallsized corpora. It is also shown that the pathconstrained graph walk algorithm yields both performance and scalability gains. 1
WikiWalk: Random walks on Wikipedia for Semantic Relatedness
"... Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatednes ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a stateof-the-art measure. 1
Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD
"... This paper explores the application of knowledgebased Word Sense Disambiguation systems to specific domains, based on our state-of-the-art graphbased WSD system that uses the information in WordNet. Evaluation was performed over a publicly available domain-specific dataset of 41 words related to Spo ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper explores the application of knowledgebased Word Sense Disambiguation systems to specific domains, based on our state-of-the-art graphbased WSD system that uses the information in WordNet. Evaluation was performed over a publicly available domain-specific dataset of 41 words related to Sports and Finance, comprising examples drawn from three corpora: one balanced corpus (BNC), and two domain-specific corpora (news related to Sports and Finance). The results show that in all three corpora our knowledge-based WSD algorithm improves over previous results, and also over two state-of-the-art supervised WSD systems trained on SemCor, the largest publicly available annotated corpus. We also show that using related words as context, instead of the actual occurrence contexts, yields better results on the domain datasets, but not on the general one. Interestingly, the results are higher for domain-specific corpus than for the general corpus, raising prospects for improving current WSD systems when applied to specific domains. 1
Text relatedness based on a word thesaurus
- Artificial Intelligence Research
, 2010
"... The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments m ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches. 1.
Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints
"... In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce a meaningful summary, these words and phrases, which are domain synonyms, nee ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce a meaningful summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature group. This paper proposes a constrained semisupervised learning method to solve the problem. Experimental results using reviews from five different domains show that the proposed method is competent for the task. It outperforms the original EM and the state-of-the-art
A Graph-Theoretic Framework for Semantic Distance
"... Many NLP applications entail that texts are classified based on their semantic distance (how similar or different the texts are). For example, comparing the text of a new document to those of documents of known topics can help identify the topic of the new text. Typically, a distributional distance ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Many NLP applications entail that texts are classified based on their semantic distance (how similar or different the texts are). For example, comparing the text of a new document to those of documents of known topics can help identify the topic of the new text. Typically, a distributional distance is used to capture the implicit semantic distance between two pieces of text. However, such approaches do not take into account the semantic relations between words. In this paper, we introduce an alternative method of measuring the semantic distance between texts that integrates distributional information and ontological knowledge within a network-flow formalism. We first represent each text as a collection of frequency-weighted concepts within an ontology. We then make use of a network-flow method which provides an efficient way of explicitly measuring the frequency-weighted ontological distance between the concepts across two texts. We evaluate our method in a variety of NLP tasks, and find that it performs well on two of three tasks. We develop a new measure of semantic coherence that enables us to account for the performance difference across the three datasets, shedding light on the properties of a dataset that lend itself well to our method. 1.
Adaptive Graph Walk Based Similarity Measures in Entity-Relation Graphs
, 2008
"... Relational or semi-structured data is naturally represented by a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different types of objects and multiple types of links. For example, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Relational or semi-structured data is naturally represented by a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different types of objects and multiple types of links. For example, email data can be described in a graph that includes messages, persons, dates and other objects; in this graph, a message may be associated with a person with different relations, such as ”sent-to”, ”sent-from ” and so on. In the past, researchers have suggested to apply random graph walks in order to elicit a measure of similarity between entities that are not directly connected in a graph. In this thesis, we suggest a general framework, in which different arbitrary queries (for instance, ”what persons are most related to this email message?”) are addressed using random walks. Naturally, there are many types of queries possible that correspond to various flavors of inter-entity similarity; several learning techniques are therefore suggested and evaluated that adapt the graph-walk
Semantic Relatedness from Automatically Generated Semantic Networks
"... We introduce a novel approach to measuring semantic relatedness of terms based on an automatically generated, large-scale semantic network. We present promising first results that indicate potential competitiveness with approaches based on manually created resources. 1 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We introduce a novel approach to measuring semantic relatedness of terms based on an automatically generated, large-scale semantic network. We present promising first results that indicate potential competitiveness with approaches based on manually created resources. 1

