Results 1 -
5 of
5
A Statistical Profile of the Named Entity Task
- PROC. ACL CONFERENCE FOR APPLIED NATURAL LANGUAGE PROCESSING
, 1997
"... In this paper we present a statistical profile of the Named Entity task, a specific information extraction task for which corpora in several languages are available. Using the results of the statistical analysis, we propose an algorithm for lower bound estimation for Named Entity corpora and di ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
In this paper we present a statistical profile of the Named Entity task, a specific information extraction task for which corpora in several languages are available. Using the results of the statistical analysis, we propose an algorithm for lower bound estimation for Named Entity corpora and discuss the significance of the cross-lingual comparisons provided by the analysis.
Named Entity Scoring for Speech Input
, 1998
"... This paper describes a new scoring algorithm that supports comparison of linguistically annotated data from noisy sources. The new algorithm generalizes the Message Understanding Conference (MUC) Named Entity scoring algorithm, using a comparison based on explicit alignment of the underlying t ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper describes a new scoring algorithm that supports comparison of linguistically annotated data from noisy sources. The new algorithm generalizes the Message Understanding Conference (MUC) Named Entity scoring algorithm, using a comparison based on explicit alignment of the underlying texts, followed by a scoring phase. The scoring procedure maps corresponding tagged regions and compares these according to tag type and tag extent, allowing us to reproduce the MUC Named Entity scoring for identical underlying texts. In addition, the new algorithm scores for contettt (transcription correctness) of the tagged region, a useful distinction when dealing with noisy data that may differ from a reference transcription (e.g., speech recognizer output). To illustrate the algorithm, we have prepared a small test data set consisting of a careful transcription of speech data and manual insertion of SGML named entity annotation. We report results for this small test corpus on a variety of experiments involving automatic speech recognition and named entity tagging.
A golden resource for named entity recognition in Portuguese
"... Abstract. This paper presents a collection of texts manually annotated with named entities in context, which was used for HAREM, the first evaluation contest for named entity recognizers for Portuguese. We discuss the options taken and the originality of our approach compared with previous evaluatio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. This paper presents a collection of texts manually annotated with named entities in context, which was used for HAREM, the first evaluation contest for named entity recognizers for Portuguese. We discuss the options taken and the originality of our approach compared with previous evaluation initiatives in the area. We document the choice of categories, their quantitative weight in the overall collection and how we deal with vagueness and underspecification. 1
Methods in Biomedical Text Mining
, 2008
"... Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method ..."
Abstract
- Add to MetaCart
Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method is presented to streamline curation of text-mined data and a way to improve text mining of biomedical terms that can be adapted to other domains using different machine learning techniques. These advances can be integrated into more powerful text-mining systems to meet user demand and to further promote the adoption of text-mining tools. Additionally, three studies on the nature of biomedical publications are presented: their novelty hinges on the fact that each asks questions that had not been posed before. They cover the phenomena of retraction, ways to improve the impact of research, and the writing style used in biomedical literature. Retraction is a hot topic in recent times

