Results 1 - 10
of
320
An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links
- In Proceedings of AAAI 2008
, 2008
"... This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Our approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its ..."
Abstract
-
Cited by 167 (8 self)
- Add to MetaCart
(Show Context)
This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Our approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter.
Measuring semantic similarity between words using web search engines
- in Proceedings of WWW
, 2007
"... Measuring the semantic similarity between words is an important component in various semantic web-related applications such as community mining, relation extraction and au-tomatic meta data extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring ..."
Abstract
-
Cited by 138 (7 self)
- Add to MetaCart
(Show Context)
Measuring the semantic similarity between words is an important component in various semantic web-related applications such as community mining, relation extraction and au-tomatic meta data extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or enti-ties) remains a challenging task. We propose a semantic similarity measure that uses the information available on the Web to measure similarity between words or entities. The pro-posed method exploits page counts and text snippets returned by a Web search engine. We define various similarity scores for two given words P and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using automatically extracted lexical-syntactic patterns from text snippets. These different similarity scores are integrated using support vector machines, to leverage a robust semantic similarity measure. Experimental results on Miller-Charles benchmark dataset show that the proposed measure outperforms all the existing web-based semantic similar-ity measures by a wide margin, achieving a correlation coefficient of 0.867. Moreover, the proposed semantic similarity measure significantly improves the accuracy (F-measure of 0.78) in a community mining task, and in an entity disambiguation task, thereby verifying the capability of the proposed measure to capture semantic similarity using web content. Key words: semantic similarity measures, web mining, community extraction
Mining meaning from Wikipedia
, 2009
"... Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts an ..."
Abstract
-
Cited by 76 (2 self)
- Add to MetaCart
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.
An open-source toolkit for mining wikipedia
- In Proc. New Zealand Computer Science Research Student Conf
"... The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wik ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wikipedia Miner toolkit: an open-source collection of code that allows researchers and developers to easily integrate Wikipedia's rich semantics into their own applications. The Wikipedia Miner toolkit is already a mature product. In this paper we describe how it provides simplified, object-oriented access to Wikipedia’s structure and content, how it allows terms and concepts to be compared semantically, and how it can detect Wikipedia topics when they are mentioned in documents. We also describe how it has already been applied to several different research problems. However, the toolkit is not intended to be a complete, polished product; it is instead an entirely open-source project that we hope will continue to evolve.
Clustering to find exemplar terms for keyphrase extraction
- In Proceedings of EMNLP
, 2009
"... Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the do ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
(Show Context)
Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the document, but also have a good coverage of the document. Based on this observation, we propose an unsupervised method for keyphrase extraction. Firstly, the method finds exemplar terms by leveraging clustering techniques, which guarantees the document to be semantically covered by these exemplar terms. Then the keyphrases are extracted from the document using the exemplar terms. Our method outperforms sate-of-the-art graphbased ranking methods (TextRank) by 9.5 % in F1-measure. 1
Using google distance to weight approximate ontology matches
- In: Proc. WWW-07. (2007
, 2007
"... Discovering mappings between concept hierarchies is widely regarded as one of the hardest and most urgent ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
(Show Context)
Discovering mappings between concept hierarchies is widely regarded as one of the hardest and most urgent
Semantically enriching folksonomies with FLOR
- In Proc of the 5th ESWC. workshop: Collective Intelligence & the Semantic Web
, 2008
"... Abstract. While the increasing popularity of folksonomies has lead to a vast quantity of tagged data, resource retrieval in folksonomies is limited by being agnostic to the meaning (i.e., semantics) of tags. Our goal is to automatically enrich folksonomy tags (and implicitly the related resources) w ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
(Show Context)
Abstract. While the increasing popularity of folksonomies has lead to a vast quantity of tagged data, resource retrieval in folksonomies is limited by being agnostic to the meaning (i.e., semantics) of tags. Our goal is to automatically enrich folksonomy tags (and implicitly the related resources) with formal semantics by associating them to relevant concepts defined in online ontologies. We introduce FLOR, a method that performs automatic folksonomy enrichment by combining knowledge from WordNet and online available ontologies. Experimentally testing FLOR, we found that it correctly enriched 72 % of 250 Flickr photos. 1
evolutionary distances
, 2006
"... SUMMARY: We have created a tool for ortholog and phylogenetic profile retrieval called Roundup. Roundup is backed by a massive repository of orthologs and associated evolutionary distances that was built using the reciprocal smallest distance algorithm, an approach that hasbeenshown to improveuponal ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
(Show Context)
SUMMARY: We have created a tool for ortholog and phylogenetic profile retrieval called Roundup. Roundup is backed by a massive repository of orthologs and associated evolutionary distances that was built using the reciprocal smallest distance algorithm, an approach that hasbeenshown to improveuponalternativeapproaches of ortholog detection, such as reciprocal blast. Presently, the Roundup repository contains all possible pair-wise comparisons for over 250 genomes, including 32 Eukaryotes, more than doubling the coverage of any similar resource. The orthologs are accessible through an intuitive web interface that allows searches by genome or gene identifier, presenting results as phylogenetic profiles together with gene and molecular function annotations. Results may be downloaded as phylogenetic matrices for subsequent analysis, including the construction of whole-genome phylogenies based on gene-content data.
Image Tag Refinement Towards Low-Rank, Content-Tag Prior and Error Sparsity
"... The vast user-provided image tags on the popular photo sharing websites may greatly facilitate image retrieval and management. However, these tags are often imprecise and/or incomplete, resulting in unsatisfactory performances in tag related applications. In this work, the tag refinement problem is ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
(Show Context)
The vast user-provided image tags on the popular photo sharing websites may greatly facilitate image retrieval and management. However, these tags are often imprecise and/or incomplete, resulting in unsatisfactory performances in tag related applications. In this work, the tag refinement problem is formulated as a decomposition of the user-provided tag matrix D into a low-rank refined matrix A and a sparse error matrix E, namelyD = A + E, targeting the optimality measured by four aspects: 1) low-rank: A is of low-rank owing to the semantic correlations among the tags; 2) content consistency: if two images are visually similar, their tag vectors (i.e., column vectors of A) should also be similar; 3) tag correlation: if two tags co-occur with high frequency in general images, their co-occurrence frequency (described by two row vectors of A) should also be high; and 4) error sparsity: thematrixE is sparse since the tag matrix D is sparse and also humans can provide reasonably accurate tags. All these components finally constitute a constrained yet convex optimization problem, and an efficient convergence provable iterative procedure is proposed for the optimization based on accelerated proximal gradient method. Extensive experiments on two benchmark Flickr datasets, with 25K and 270K images respectively, well demonstrate the effectiveness of the proposed tag refinement approach.