• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 152
Next 10 →

Named Entity Recognition in Wikipedia

by Dominic Balasuriya, Nicky Ringland, Joel Nothman, Tara Murphy, James R. Curran
"... Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia’s link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corp ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia’s link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire

Augmenting Wikipedia with Named Entity Tags

by Wisam Dakka
"... Wikipedia is the largest organized knowledge repository on the Web, increasingly employed by natural language processing and search tools. In this paper, we investigate the task of labeling Wikipedia pages with standard named entity tags, which can be used further by a range of information extractio ..."
Abstract - Cited by 19 (0 self) - Add to MetaCart
Wikipedia is the largest organized knowledge repository on the Web, increasingly employed by natural language processing and search tools. In this paper, we investigate the task of labeling Wikipedia pages with standard named entity tags, which can be used further by a range of information

Transforming Wikipedia into named entity training data

by Joel Nothman, James R. Curran, Tara Murphy - In Proceedings of the Australasian Language Technology Association Workshop 2008 , 2008
"... Statistical named entity recognisers require costly hand-labelled training data and, as a result, most existing corpora are small. We exploit Wikipedia to create a massive corpus of named entity annotated text. We transform Wikipedia’s links into named entity annotations by classifying the target ar ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
Statistical named entity recognisers require costly hand-labelled training data and, as a result, most existing corpora are small. We exploit Wikipedia to create a massive corpus of named entity annotated text. We transform Wikipedia’s links into named entity annotations by classifying the target

Entity ranking in Wikipedia

by Anne-marie Vercoustre, James A. Thom, Jovan Pehcevski - In Proceedings of the 23rd Annual ACM Symposium on Applied Computing (SAC08 , 2008
"... The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are man ..."
Abstract - Cited by 19 (3 self) - Add to MetaCart
The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates

Recall-Oriented Learning of Named Entities in Arabic Wikipedia

by Behrang Mohit, Nathan Schneider, Rishav Bhowmick, Kemal Oflazer, Noah A. Smith
"... We consider the problem of NER in Arabic Wikipedia, a semisupervised domain adaptation setting for which we have no labeled training data in the target domain. To facilitate evaluation, we obtain annotations for articles in four topical groups, allowing annotators to identify domain-specific entity ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
We consider the problem of NER in Arabic Wikipedia, a semisupervised domain adaptation setting for which we have no labeled training data in the target domain. To facilitate evaluation, we obtain annotations for articles in four topical groups, allowing annotators to identify domain-specific entity

Improved Text Categorisation for Wikipedia Named Entities

by Sam Tardif, James R. Curran, Tara Murphy
"... The accuracy of named entity recognition systems relies heavily upon the volume and quality of available training data. Improving the process of automatically producing such training data is an important task, as manual acquisition is both time consuming and expensive. We explore the use of a variet ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
variety of machine learning algorithms for categorising Wikipedia articles, an initial step in producing the named entity training data. We were able to achieve a categorisation accuracy of 95 % F-score over six coarse categories, an improvement of up to 5 % F-score over previous methods. 1

Using Wikipedia for Hierarchical Finer Categorization of Named Entities

by Aasish Pappu
"... Abstract. Wikipedia is one of the largest growing structured resources on the Web and can be used as a training corpus in natural language processing applications. In this work, we present a method to categorize named entities under the hierarchical fine-grained categories provided by the Wikipedia ..."
Abstract - Add to MetaCart
Abstract. Wikipedia is one of the largest growing structured resources on the Web and can be used as a training corpus in natural language processing applications. In this work, we present a method to categorize named entities under the hierarchical fine-grained categories provided

Extracting geospatial entities from wikipedia

by Jeremy Witmer, Jugal Kalita - IEEE International Conference on Semantic Computing , 2009
"... This paper addresses the challenge of extracting geospa-tial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vec-tor Machine (SVM) for the task of geospatial named entity recog ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This paper addresses the challenge of extracting geospa-tial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vec-tor Machine (SVM) for the task of geospatial named entity

A Named Entity Labeler for German: exploiting Wikipedia and distributional clusters

by Grzegorz Chrupała, Dietrich Klakow
"... Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. We ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools

Named Entity Corpus Construction using Wikipedia and DBpedia Ontology

by Younggyun Hahm , Jungyeul Park , Kyungtae Lim , Youngsik Kim , Dosam Hwang , Key-Sun Choi
"... Abstract In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain language ..."
Abstract - Add to MetaCart
languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus generated by our proposed method, can be used as training data. Our approach introduces Wikipedia as a raw text and uses the DBpedia data set for named entity disambiguation. Our method is language
Next 10 →
Results 1 - 10 of 152
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University