Results 1 -
8 of
8
From Distributional to Semantic Similarity
, 2003
"... Lexical-semantic resources, including thesauri and WORDNET, have been successfully incorporated into a wide range of applications in Natural Language Processing. However they are very difficult and expensive to create and maintain, and their usefulness has been severely hampered by their limited cov ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
Lexical-semantic resources, including thesauri and WORDNET, have been successfully incorporated into a wide range of applications in Natural Language Processing. However they are very difficult and expensive to create and maintain, and their usefulness has been severely hampered by their limited coverage, bias and inconsistency. Automated and semi-automated methods for developing such resources are therefore crucial for further resource development and improved application performance.
Chunking with WPDV Models
- In Proceedings of CoNLL-2000 and LLL-2000
, 2000
"... this paper I describe the application of the WPDV algorithm to the CoNLL-2000 shared task, the identification of base chunks in English text (Tjong Kim Sang and Buchholz, 2000). For this task, I use a three-stage architecture: I first run five different base chunkers, then combine them and finally t ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
this paper I describe the application of the WPDV algorithm to the CoNLL-2000 shared task, the identification of base chunks in English text (Tjong Kim Sang and Buchholz, 2000). For this task, I use a three-stage architecture: I first run five different base chunkers, then combine them and finally try to correct some recurring errors. Except for one base chunker, which uses the memory-based machine learning system TiMBL,
Transforming a Chunker to a Parser
- LINGUISTICS IN THE
, 2000
"... Ever since the landmark paper Ramshaw and Marcus (1995), machine learning systems have been used successfully for identifying base phrases (chunks), the bottom constituents of a parse tree. We expand a state-of-the-art chunking algorithm to a bottom-up parser by recursively applying the chunker to i ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Ever since the landmark paper Ramshaw and Marcus (1995), machine learning systems have been used successfully for identifying base phrases (chunks), the bottom constituents of a parse tree. We expand a state-of-the-art chunking algorithm to a bottom-up parser by recursively applying the chunker to its own output. After testing different training configurations we obtain a reasonable parser which is tested against a standard data set. Its performance falls behind that of current state-of-the-art parsers. We give some suggestions for modifications of the parser which may lead to future performance improvements.
Combining Outputs of Multiple Japanese Named Entity Chunkers
- In IPSJ SIG notes
, 2002
"... In this paper, we propose a method for learning a classifier which combines outputs of more than one Japanese named entity extractors. The proposed combination method belongs to the family of stacked generalizers, which is in principle a technique of combining outputs of several classifiers a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper, we propose a method for learning a classifier which combines outputs of more than one Japanese named entity extractors. The proposed combination method belongs to the family of stacked generalizers, which is in principle a technique of combining outputs of several classifiers at the first stage by learning a second stage classifier to combine those outputs at the first stage. Individual models to be combined are based on maximum entropy models, one of which always considers surrounding contexts of a fixed length, while the other considers those of variable lengths according to the number of constituent morphemes of named entities. As an algorithm for learning the second stage classifier, we employ a decision list learning method. Experimental evaluation shows that the proposed method achieves improvement over the best known results with Japanese named entity extractors based on maximum entropy models.
Impact of imperfect OCR on part-of-speech tagging
- Proc. of International Conference on Document Analysis and Recognition 2003
, 2002
"... Part-of-speech (POS) tagging is the foundation of natural language processing (NLP) systems, and thus has been an active area of research for many years. However, one question remains unanswered: How will a POS tagger behave when the input text is not error-free? This issue can be of great importa ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Part-of-speech (POS) tagging is the foundation of natural language processing (NLP) systems, and thus has been an active area of research for many years. However, one question remains unanswered: How will a POS tagger behave when the input text is not error-free? This issue can be of great importance when the text comes from imperfect sources like Optical Character Recognition (OCR). This paper analyzes the performance of both individual POS taggers and combination systems on imperfect text. Experimental results show that a POS tagger's accuracy will decrease linearly with the character error rate and the slope indicates a tagger's sensitivity to input text errors.
Shallow Parsing using Probabilistic Grammatical Inference
- Sacaan A.I., Santori E., Stauderman K.A., Whelan K., Lloyd G.K., McDonald I.A., (S)-(-)-5-ethynyl3 -(l-methyl-2-pyrrolidinyl)pyridine
, 2002
"... This paper presents a machine learning approach to shallow parsing using techniques of grammatical inference. We first learn a deterministic probabilistic automaton that models the joint distribution of chunk and Part-of-speech tags, and then use this automaton as a transducer to find the most l ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper presents a machine learning approach to shallow parsing using techniques of grammatical inference. We first learn a deterministic probabilistic automaton that models the joint distribution of chunk and Part-of-speech tags, and then use this automaton as a transducer to find the most likely chunk tag sequence using a dynamic programming algorithm. The resulting transducers can also be combined with statistical P05' taggers. We also discuss an efficient means of incorporating lexical information together with an application of bagging that improve our results.
Learning Computational Grammars
"... This report presents a general overview of the network related activities at this site and specific reports for the postdoc, the PhD student, the local coordinator and others. An overview of the training activities concludes this section ..."
Abstract
- Add to MetaCart
This report presents a general overview of the network related activities at this site and specific reports for the postdoc, the PhD student, the local coordinator and others. An overview of the training activities concludes this section
IMC AG,
"... We have developed a new OSGi-based platform for Named Entity Recognition (NER) which uses a voting strategy to combine the results produced by several existing NER systems (currently OpenNLP, LingPipe and Stanford). The different NER systems have been systematically decomposed and modularized into t ..."
Abstract
- Add to MetaCart
We have developed a new OSGi-based platform for Named Entity Recognition (NER) which uses a voting strategy to combine the results produced by several existing NER systems (currently OpenNLP, LingPipe and Stanford). The different NER systems have been systematically decomposed and modularized into the same pipeline of preprocessing components in order to support a flexible selection and ordering of the NER processing flow. This high modular and component-based design supports the possibility to setup different constellations of chained processing steps including alternative voting strategies for combining the results of parallel running components.

