Results 1 - 10
of
29
Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence
, 1999
"... Identifying and classifying personal, geographic, institutional or other names in a text is an important task for numerous applications. This paper describes and evaluates a language-independent bootstrapping algorithm based on iterative learning and re-estimation of contextual and morphological pat ..."
Abstract
-
Cited by 81 (4 self)
- Add to MetaCart
Identifying and classifying personal, geographic, institutional or other names in a text is an important task for numerous applications. This paper describes and evaluates a language-independent bootstrapping algorithm based on iterative learning and re-estimation of contextual and morphological patterns captured in hierarchicaily smoothed trie models. The algorithm learns from unannotated text and achieves competitive performance when trained on a very short labelled name list with no other required language-specific information, tokenizers or tools.
Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory
- CURRENT DIRECTIONS IN DISCOURSE AND DIALOGUE
, 2001
"... We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in
An Empirically-Based System for Processing Definite Descriptions
, 2000
"... this paper, we present an implemented system for processing definite Universidade do Vale do Rio dos Sinos - UNISINOS, Av. Unisinos 950 - Cx. Postal 275, 93022-000 ..."
Abstract
-
Cited by 49 (11 self)
- Add to MetaCart
this paper, we present an implemented system for processing definite Universidade do Vale do Rio dos Sinos - UNISINOS, Av. Unisinos 950 - Cx. Postal 275, 93022-000
Recognizing Referential Links: An Information Extraction Perspective
, 1997
"... We present an efficient and robust reference resolution algorithm in an end-to-end state-of-the-art information extraction system, which must work with a considerably impoverished syntactic analysis of the input sentences. Considering this disadvantage, the basic setup to collect, filter, then order ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
We present an efficient and robust reference resolution algorithm in an end-to-end state-of-the-art information extraction system, which must work with a considerably impoverished syntactic analysis of the input sentences. Considering this disadvantage, the basic setup to collect, filter, then order by salience does remarkably well with third-person pronouns, but needs more semantic and discourse information to improve the treatments of other expression types.
A Statistical Profile of the Named Entity Task
- PROC. ACL CONFERENCE FOR APPLIED NATURAL LANGUAGE PROCESSING
, 1997
"... In this paper we present a statistical profile of the Named Entity task, a specific information extraction task for which corpora in several languages are available. Using the results of the statistical analysis, we propose an algorithm for lower bound estimation for Named Entity corpora and di ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
In this paper we present a statistical profile of the Named Entity task, a specific information extraction task for which corpora in several languages are available. Using the results of the statistical analysis, we propose an algorithm for lower bound estimation for Named Entity corpora and discuss the significance of the cross-lingual comparisons provided by the analysis.
Applying Machine Learning for High Performance Named-Entity Extraction
, 1999
"... This paper describes a machine learning approach to build an efficient, accurate and fast name spotting system. Finding names in free text is an important task in addressing real-world text-based applications. Most previous approaches have been based on carefully hand-crafted modules encoding lingui ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
This paper describes a machine learning approach to build an efficient, accurate and fast name spotting system. Finding names in free text is an important task in addressing real-world text-based applications. Most previous approaches have been based on carefully hand-crafted modules encoding linguistic knowledge specific to the language and document genre. Such approaches have two drawbacks: they require large amounts of time and linguistic expertise to develop, and they are not easily portable to new languages and genres. This paper describes an extensible system which automatically combines weak evidence for name extraction. This evidence is gathered from easily available sources: part-of-speech tagging, dictionary lookups, and textual information such as capitalization and punctuation. Individually, each piece of evidence is insuFFIcient for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand-crafted approaches.
A light-weight approach to coreference resolution for named entities in text. Unpublished M.Sc
, 2002
"... This paper presents a lightweight approach to pronoun resolution in the case when the antecedent is a named entity. It falls under the category of the so-called ‘knowledge poor ’ approaches that do not rely extensively on linguistic or domain knowledge. We provide a practical implementation of this ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
This paper presents a lightweight approach to pronoun resolution in the case when the antecedent is a named entity. It falls under the category of the so-called ‘knowledge poor ’ approaches that do not rely extensively on linguistic or domain knowledge. We provide a practical implementation of this approach as a component of the General Architecture for Text Engineering (GATE). The results of the evaluation show that even such shallow and inexpensive approaches provide acceptable performance for resolving the pronoun anaphora of named entities in texts. 1
Analyzing the complexity of a domain with respect to an information extraction task
- in MUC-7
, 1998
"... In this paper we describe a method of classifying facts (information) into categories or levels; where each level signi es a di erent degree of syntactic complexity related to a fact. Based on this classi cation mechanism, we also propose a method of evaluating a domain by assigning to it a \domain ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
In this paper we describe a method of classifying facts (information) into categories or levels; where each level signi es a di erent degree of syntactic complexity related to a fact. Based on this classi cation mechanism, we also propose a method of evaluating a domain by assigning to it a \domain number" based on the levels of a set of standard facts present in the articles of that domain.
Improving the scalability of semi-markov conditional random fields for named entity recognition
- In Proceedings of ACL 2006
, 2006
"... This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many labels which increase the computational cost. To reduce the computational cost, we propose two technique ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many labels which increase the computational cost. To reduce the computational cost, we propose two techniques: the first is the use of feature forests, which enables us to pack feature-equivalent states, and the second is the introduction of a filtering process which significantly reduces the number of candidate states. This framework allows us to use a rich set of features extracted from the chunk-based representation that can capture informative characteristics of entities. We also introduce a simple trick to transfer information about distant entities by embedding label information into non-entity labels. Experimental results show that our model achieves an F-score of 71.48 % on the JNLPBA 2004 shared task without using any external resources or post-processing techniques. 1

