Results 1 -
4 of
4
Named Entity Recognition without Gazetteers
, 1999
"... It is often claimed that Named Entity recognition systems need extensive gazetteers--lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We r ..."
Abstract
-
Cited by 101 (5 self)
- Add to MetaCart
It is often claimed that Named Entity recognition systems need extensive gazetteers--lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition. We show that, for the text type and task of this competition, it is sufficient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. We conclude with observations about the domain independence of the competition and of our experiments.
Can we make Information Extraction more adaptive?
- Proceedings of the SCIE99 Workshop
, 1999
"... It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of pra ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains. In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the bench-mark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have preexisting templates ready to hand, as opposed to a user who has a ...
Using Gazetteers in Discriminative Information Extraction
- In CoNLL-X, Tenth Conference on Computational Natural Language Learning
, 2006
"... Much work on information extraction has successfully used gazetteers to recognise uncommon entities that cannot be reliably identified from local context alone. Approaches to such tasks often involve the use of maximum entropy-style models, where gazetteers usually appear as highly informative featu ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Much work on information extraction has successfully used gazetteers to recognise uncommon entities that cannot be reliably identified from local context alone. Approaches to such tasks often involve the use of maximum entropy-style models, where gazetteers usually appear as highly informative features in the model. Although such features can improve model accuracy, they can also introduce hidden negative effects. In this paper we describe and analyse these effects and suggest ways in which they may be overcome. In particular, we show that by quarantining gazetteer features and training them in a separate model, then decoding using a logarithmic opinion pool (Smith et al., 2005), we may achieve much higher accuracy. Finally, we suggest ways in which other features with gazetteer feature-like behaviour may be identified. 1
IR and AI: traditions of representation and anti-representation in information processing
"... The paper is concerned with the role of conceptual representations in access to information, as for example, from the World Wide Web. It contrasts two quite different traditions for doing this: Informa- tion Retrieval (IR) and more recently Information Extraction (IE), a de- velopment of the nat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The paper is concerned with the role of conceptual representations in access to information, as for example, from the World Wide Web. It contrasts two quite different traditions for doing this: Informa- tion Retrieval (IR) and more recently Information Extraction (IE), a de- velopment of the natural language processing tradition within Artificial Intelligence (AI). The former has been statistical in nature and largely representation-free (though we discuss exceptions), while the latter has been based on representations making use of ontologics and lexicons in semantics and grammars in syntax. However, this distinction has been eroded by the growth in recent years of machine learning methods in IE, which have attempted to match IE performance but with methods less committed to representations: some have no representations, and some seek to learn them automatically from cases of their assignment. We discuss

