• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Using Encyclopedic Knowledge for Named Entity Disambiguation (2006)

by Razvan Bunescu
Venue:In EACL
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 65
Next 10 →

Wikirelate! computing semantic relatedness using wikipedia

by Michael Strube, Simone Paolo Ponzetto - In Proceedings of the 21st national conference on Artificial intelligence , 2006
"... Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datase ..."
Abstract - Cited by 87 (2 self) - Add to MetaCart
Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP application processing naturally occurring texts.

Yago: A Large Ontology from Wikipedia and WordNet

by Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum , 2007
"... This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy a ..."
Abstract - Cited by 43 (11 self) - Add to MetaCart
This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO’s precision at 95% – as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO’s data.

An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links

by David Milne, Ian H. Witten - In Proceedings of AAAI 2008 , 2008
"... This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Our approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its ..."
Abstract - Cited by 42 (6 self) - Add to MetaCart
This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Our approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter.

Mining domain-specific thesauri from wikipedia: A case study

by David Milne, Olena Medelyan, Ian H. Witten - IN: PROC. OF ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE , 2006
"... Domain-specific thesauri are high-cost, highmaintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia, a vast, open encyclopedia. In a comparison with a professional thesaurus for agriculture (Agrovoc) we fi ..."
Abstract - Cited by 23 (6 self) - Add to MetaCart
Domain-specific thesauri are high-cost, highmaintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia, a vast, open encyclopedia. In a comparison with a professional thesaurus for agriculture (Agrovoc) we find that Wikipedia contains a substantial proportion of its domain-specific concepts and semantic relations; furthermore it has impressive coverage of a collection of contemporary documents in the domain. Thesauri derived using these techniques are attractive because they capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.

Using Wikipedia for Automatic Word Sense Disambiguation

by Rada Mihalcea
"... This paper describes a method for generating sense-tagged data using Wikipedia as a source of sense annotations. Through word sense disambiguation experiments, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers. ..."
Abstract - Cited by 21 (1 self) - Add to MetaCart
This paper describes a method for generating sense-tagged data using Wikipedia as a source of sense annotations. Through word sense disambiguation experiments, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.

Collective Annotation of Wikipedia Entities in Web Text

by Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti
"... To take the first step beyond keyword-based search toward entity-based search, suitable token spans (“spots”) on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are ..."
Abstract - Cited by 20 (3 self) - Add to MetaCart
To take the first step beyond keyword-based search toward entity-based search, suitable token spans (“spots”) on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text around the spot and textual metadata associated with the entity. Two recent systems exploit inter-label dependencies, but in limited ways. We propose a general collective disambiguation approach. Our premise is that coherent documents refer to entities from one or a few related topics or domains. We give formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities. Optimizing the overall entity assignment is NP-hard. We investigate practical solutions based on local hill-climbing, rounding integer linear programs, and pre-clustering entities followed by local optimization within clusters. In experiments involving over a hundred manuallyannotated Web pages and tens of thousands of spots, our approaches significantly outperform recently-proposed algorithms.

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

by Kentaro Torisawa
"... We explore the use of Wikipedia as external knowledge to improve named entity recognition (NER). Our method retrieves the corresponding Wikipedia entry for each candidate word sequence and extracts a category label from the first sentence of the entry, which can be thought of as a definition ..."
Abstract - Cited by 18 (0 self) - Add to MetaCart
We explore the use of Wikipedia as external knowledge to improve named entity recognition (NER). Our method retrieves the corresponding Wikipedia entry for each candidate word sequence and extracts a category label from the first sentence of the entry, which can be thought of as a definition

Media Meets Semantic Web - How the BBC Uses DBpedia and Linked Data to Make Conections

by Georgi Kobilarov, Tom Scott, Silver Oliver, Chris Sizemore, Michael Smethurst, Christian Bizer - In European Semantic Web Conference, Semantic Web in Use Track , 2009
"... Abstract. In this paper, we describe how the BBC is working to integrate data and linking documents across BBC domains by using Semantic Web technology, in particular Linked Data, MusicBrainz and DBpedia. We cover the work of BBC Programmes and BBC Music building Linked Data sites for all music and ..."
Abstract - Cited by 17 (3 self) - Add to MetaCart
Abstract. In this paper, we describe how the BBC is working to integrate data and linking documents across BBC domains by using Semantic Web technology, in particular Linked Data, MusicBrainz and DBpedia. We cover the work of BBC Programmes and BBC Music building Linked Data sites for all music and programmes related brands, and we describe existing projects, ongoing development, and further research we are doing in a joint collaboration between the BBC, Freie Universität Berlin and Rattle Research in order to use DBpedia as the controlled vocabulary and semantic backbone for the whole BBC. 1

Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia

by Simone Paolo Ponzetto, Roberto Navigli
"... We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet synsets and using this semantic information to restructure a taxonomy automatically generated from the Wikipedia system of categories. We evaluate against a manual gold standard and show that both category d ..."
Abstract - Cited by 16 (1 self) - Add to MetaCart
We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet synsets and using this semantic information to restructure a taxonomy automatically generated from the Wikipedia system of categories. We evaluate against a manual gold standard and show that both category disambiguation and taxonomy restructuring perform with high accuracy. Besides, we assess these methods on automatically generated datasets and show that we are able to effectively enrich WordNet with a large number of instances from Wikipedia. Our approach produces an integrated resource, thus bringing together the fine-grained classification of instances in Wikipedia and a wellstructured top-level taxonomy from WordNet. 1

Knowledge derived from Wikipedia for computing semantic relatedness

by Simone Paolo Ponzetto, Michael Strube - JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH , 2007
"... Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Exi ..."
Abstract - Cited by 16 (1 self) - Add to MetaCart
Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University