Results 1 -
6 of
6
Large-scale named entity disambiguation based on Wikipedia data
- In Proc. 2007 Joint Conference on EMNLP and CNLL
, 2007
"... This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process fr ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information extracted from Wikipedia and the context of a document, as well as the agreement among the category tags associated with the candidate entities, the implemented system shows high disambiguation accuracy on both news stories and Wikipedia articles. 1 Introduction and Related Work
Augmenting Wikipedia with Named Entity Tags
"... Wikipedia is the largest organized knowledge repository on the Web, increasingly employed by natural language processing and search tools. In this paper, we investigate the task of labeling Wikipedia pages with standard named entity tags, which can be used further by a range of information extractio ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Wikipedia is the largest organized knowledge repository on the Web, increasingly employed by natural language processing and search tools. In this paper, we investigate the task of labeling Wikipedia pages with standard named entity tags, which can be used further by a range of information extraction and language processing tools. To train the classifiers, we manually annotated a small set of Wikipedia pages and then extrapolated the annotations using the Wikipedia category information to a much larger training set. We employed several distinct features for each page: bag-of-words, page structure, abstract, titles, and entity mentions. We report high accuracies for several of the classifiers built. As a result of this work, a Web service that classifies any Wikipedia page has been made available to the academic community. 1
Forostar: A system for GIR
- in Lecture Notes from the Cross Language Evaluation Forum
, 2006
"... Abstract. We detail our methods for generating and applying co-occurrence models for the purpose of placename disambiguation. We explain in detail our use of co-occurrence models for placename disambiguation using a model generated from Wikipedia. The presented system is split into two stages: a bat ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. We detail our methods for generating and applying co-occurrence models for the purpose of placename disambiguation. We explain in detail our use of co-occurrence models for placename disambiguation using a model generated from Wikipedia. The presented system is split into two stages: a batch text & geographic indexer and a real time query engine. Four alternative query constructions and six methods of generating a geographic index are compared. The paper concludes with a full description of future work and ways in which the system could be optimised. 1
Place disambiguation with co-occurrence models
- CLEF 2006 Workshop, Working notes
, 2006
"... In this paper we describe the geographic information retrieval system developed and results achieved by the Multimedia & Information Systems team for GeoCLEF 2006. We detail our methods for generating and applying co-occurrence models for the purpose of place name disambiguation, our use of named en ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we describe the geographic information retrieval system developed and results achieved by the Multimedia & Information Systems team for GeoCLEF 2006. We detail our methods for generating and applying co-occurrence models for the purpose of place name disambiguation, our use of named entity recognition tools and text indexing applications. The presented system is split into two stages: a batch text and geographic indexer and a real time query engine. The query engine takes manually crafted queries where the text component is separated from the geographic component. Two monolingual runs were submitted for the GeoCLEF evaluation, the first constructed from the title and description, the second included the narrative also. We explain in detail our use of co-occurrence models for place name disambiguation using a model generated from Wikipedia. Our results place us between the first quartile and mean for mean average precision, this is as expected for a naïve un-optimised approach. The paper concludes with a full description of future work and ways in which the system could be optimised.
Location Identification for the Geographic information Retrieval
"... Abstract. In this paper we identify location names that appear in queries written in Indonesian using geographic gazeeter. We built the gazeeter by collecting geographic information from a number of geographic resources. We translated an Indonesian query set into English using a machine translation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. In this paper we identify location names that appear in queries written in Indonesian using geographic gazeeter. We built the gazeeter by collecting geographic information from a number of geographic resources. We translated an Indonesian query set into English using a machine translation technique. We also made an attempt to improve the retrieval effectiveness using a query expansion technique. The result shows that identifying locations in the queries and applying the query expansion technique can help improve the retrieval effectiveness for certain queries.
analysis, language models
"... In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a r ..."
Abstract
- Add to MetaCart
In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia. This task may be included into the broader task of Named Entity classification, a well-known problem in the field of Natural Language Processing. The experiments were carried out considering both the full text of the articles and only the definition of the entity being described in the article. The obtained results show that the information contained in the page templates and the category labels is more useful than the text of the articles.

