• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Using encyclopedic knowledge for named entity disambiguation (2006)

by C Bunescu, M Pasca
Venue:In EACL
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 65
Next 10 →

What to be? - Electronic Career Guidance Based on Semantic Relatedness

by Iryna Gurevych, Christof Müller, Torsten Zesch - In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’2007 , 2007
"... We present a study aimed at investigating the use of semantic information in a novel NLP application, Electronic Career Guidance (ECG), in German. ECG is formulated as an information retrieval (IR) task, whereby textual descriptions of professions (documents) are ranked for their relevance to natura ..."
Abstract - Cited by 15 (8 self) - Add to MetaCart
We present a study aimed at investigating the use of semantic information in a novel NLP application, Electronic Career Guidance (ECG), in German. ECG is formulated as an information retrieval (IR) task, whereby textual descriptions of professions (documents) are ranked for their relevance to natural language descriptions of a person’s professional interests (the topic). We compare the performance of two semantic IR models: (IR-1) utilizing semantic relatedness (SR) measures based on either wordnet or Wikipedia and a set of heuristics, and (IR-2) measuring the similarity between

Entity Disambiguation for Knowledge Base Population

by Mark Dredze, Paul Mcnamee, Delip Rao, Adam Gerber, Tim Finin
"... The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources. Further, our approach achieves performance of up to 95 % on entities mentioned from newswire and 80 % on a public test set that was designed to include challenging queries. 1

BabelNet: Building a very large multilingual semantic network

by Roberto Navigli, Sapienza Università Di Roma, Simone Paolo Ponzetto - In Proc. of ACL-10 , 2010
"... In this paper we present BabelNet – a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to e ..."
Abstract - Cited by 13 (7 self) - Add to MetaCart
In this paper we present BabelNet – a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. 1

E.: PowerMap: Mapping the Real Semantic Web on the Fly

by Vanessa Lopez, Marta Sabou, Enrico Motta - In: Proc. ISWC-06. Volume 4273 of LNCS , 2006
"... Abstract. Ontology mapping plays an important role in bridging the semantic gap between distributed and heterogeneous data sources. As the Semantic Web slowly becomes real and the amount of online semantic data increases, a new generation of tools is developed that automatically find and integrate t ..."
Abstract - Cited by 11 (6 self) - Add to MetaCart
Abstract. Ontology mapping plays an important role in bridging the semantic gap between distributed and heterogeneous data sources. As the Semantic Web slowly becomes real and the amount of online semantic data increases, a new generation of tools is developed that automatically find and integrate this data. Unlike in the case of earlier tools where mapping has been performed at the design time of the tool, these new tools require mapping techniques that can be performed at run time. The contribution of this paper is twofold. First, we investigate the general requirements for run time mapping techniques. Second, we describe our PowerMap mapping algorithm that was designed to be used at run-time by an ontology based question answering tool.

Mining Wiki Resources for Multilingual Named Entity Recognition,” ACL’08

by Alexander E. Richman, Patrick Schone , 2008
"... In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise. This process, though of value in languages for whic ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise. This process, though of value in languages for which resources exist, is particularly useful for less commonly taught languages. We show how the Wikipedia format can be used to identify possible named entities and discuss in detail the process by which we use the Category structure inherent to Wikipedia to determine the named entity type of a proposed entity. We further describe the methods by which English language data can be used to bootstrap the NER process in other languages. We demonstrate the system by using the generated corpus as training sets for a variant of BBN's Identifinder in French, Ukrainian, Spanish, Polish, Russian, and Portuguese, achieving overall F-scores as high as 84.7% on independent, human-annotated corpora, comparable to a system trained on up to 40,000 words of human-annotated newswire. 1

Supervised Semantic Indexing

by Bing Bai, Jason Weston, Ronan Collobert, David Grangier
"... Abstract. We present a class of models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, un ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
Abstract. We present a class of models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained with a supervised signal directly on the task of interest, which we argue is the reason for our superior results. We provide an empirical study on Wikipedia documents, using the links to define document-document or query-document pairs, where we obtain state-of-the-art performance using our method. Key words: supervised, semantic indexing, document ranking 1

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

by Denis Turdakov, Pavel Velikhov
"... Wikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Wikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice’s measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy. 1

WikiWalk: Random walks on Wikipedia for Semantic Relatedness

by Eric Yeh, Daniel Ramage, Christopher D. Manning, Eneko Agirre, Aitor Soroa, Ixa Taldea
"... Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatednes ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a stateof-the-art measure. 1

Knowledge-rich Word Sense Disambiguation Rivaling Supervised Systems

by Simone Paolo Ponzetto, Roberto Navigli, Sapienza Università Di Roma
"... One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when p ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. 1

Scaling Wikipedia-based named entity disambiguation to arbitrary web text

by Anthony Fader, Stephen Soderland, Oren Etzioni - IN PROC. OF WIKIAI , 2009
"... This paper investigates the “named-entity disambiguation” task on the Web—identifying the referent of a string, found on an arbitrary Web page. The GROUNDER system, introduced in this paper, addresses two challenges not considered by previous work: how to utilize a priori information (e.g., Bill Cli ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
This paper investigates the “named-entity disambiguation” task on the Web—identifying the referent of a string, found on an arbitrary Web page. The GROUNDER system, introduced in this paper, addresses two challenges not considered by previous work: how to utilize a priori information (e.g., Bill Clinton is more prominent on the Web than Clinton County) to improve disambiguation, and how to compose this prior information with contextual evidence. GROUNDER addresses both challenges by leveraging the user-contributed knowledge in Wikipedia and providing a novel formulation of the task. On a sample of strings drawn from the Web, GROUNDER achieves precision of 1.0 at recall 0.34, and precision 0.90 at recall 0.60.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University