Results 1 - 10
of
21
Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation
, 1998
"... Resnik and Yarowsky (1997) made a set of observations about the state of the art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evaluation criteria, common training and testing resources, and the definition of sense ..."
Abstract
-
Cited by 88 (8 self)
- Add to MetaCart
Resnik and Yarowsky (1997) made a set of observations about the state of the art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evaluation criteria, common training and testing resources, and the definition of sense inventories. Subsequent discussion of those proposals resulted in senseval, the first evaluation exercise for word sense disambiguation (Kilgarriff and Palmer forthcoming). This article is a revised and extended version of our 1997 workshop paper, reviewing its observations and proposals and discussing them in light of the senseval exercise. It also includes a new in-depth empirical study of translingually-based sense inventories and distance measures, using statistics collected from native-speaker annotations of 222 polysemous contexts across 12 languages. These data show that monolingual sense distinctions at most levels of granularity can be effectively captured by translations into some ...
Comparing a Linguistic and a Stochastic Tagger
- Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
, 1997
"... Concerning different approaches to automatic PoS tagging: EngCG-2, a constraintbased morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The ex- periments show that for the same amount of remainin ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Concerning different approaches to automatic PoS tagging: EngCG-2, a constraintbased morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The ex- periments show that for the same amount of remaining ambiguity, the error rate of the statistical tagger is one order of magnitude greater than that of the rule-based one. The two related issues of priming effects compromising the results and disagreement between human annotators are also addressed.
Implementing an Efficient Part-of-Speech Tagger
- Software–Practice and Experience
, 1999
"... An efficient implementation of a part-of-speech tagger for Swedish is described. The stochastic tagger uses a well-established Markov model of the language. The tagger tags 92% of unknown words correctly and up to 97% of all words. Several implementation and optimization considerations are discussed ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
An efficient implementation of a part-of-speech tagger for Swedish is described. The stochastic tagger uses a well-established Markov model of the language. The tagger tags 92% of unknown words correctly and up to 97% of all words. Several implementation and optimization considerations are discussed. The main contribution of this paper is the thorough description of the tagging algorithm and the addition of a number of improvements. The paper contains enough detail for the reader to construct a tagger for his own language. Keywords: part-of-speech tagging, word tagging, optimization, hidden Markov models. Introduction In part-of-speech (POS) tagging of a text, each word and punctuation mark in the text is assigned its morphosyntactic tag. Different tagging systems use different sets of tags, but typically a tag describes a word class and some word class specific features, such as number and gender. The number of different tags varies between a dozen and several hundred. Constructing ...
Serial Combination of Rules and Statistics: A Case Study in Czech Tagging
"... A hybrid system is described which combines the strength of manual rulewriting and statistical learning, obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
A hybrid system is described which combines the strength of manual rulewriting and statistical learning, obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing partial disambiguation with recall close to 100% is applied first, and a trigram HMM tagger runs on its results. An experiment in Czech tagging has been performed with encouraging results.
Developing a hybrid NP parser
- In Proceedings of ANLP-97
, 1997
"... We describe the use of energy function optimization in very shallow syntactic parsing. The approach can use linguistic rules and corpus-based statistics, so the strengths of both linguistic and statistical approaches to NLP can be combined in a single framework. The rules are contextual constraints ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We describe the use of energy function optimization in very shallow syntactic parsing. The approach can use linguistic rules and corpus-based statistics, so the strengths of both linguistic and statistical approaches to NLP can be combined in a single framework. The rules are contextual constraints for resolving syntactic ambiguities expressed as alternative tags, and the statistical language model consists of corpus-based n-grams of syntactic tags. The success of the hybrid syntactic disambiguator is evaluated against a held-out benchmark corpus. Also the contributions of the linguistic and statistical language models to the hybrid model are estimated. 1
Annotating topological fields and chunks - and revising POS tags at the same time
, 2002
"... Annotating a corpus of German with chunks, topological fields and clause boundaries is both a goal in itself and a step towards further syntactic annotation. Partial annotation can serve as data to test linguistic hypotheses and it can be used as a pre-structuring for further linguistic annotation s ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Annotating a corpus of German with chunks, topological fields and clause boundaries is both a goal in itself and a step towards further syntactic annotation. Partial annotation can serve as data to test linguistic hypotheses and it can be used as a pre-structuring for further linguistic annotation steps. If, however, the underlying part-of-speech (POS) annotation is imperfect, these errors will be passed on to the subsequent levels of annotation and increase annotation errors on those levels. It is especially damaging for subsequent annotation if POS tags are incorrect which provide the framework of the German sentence by demarcating the topological fields and the clause boundaries (e.g. subordinators and verbs). This paper presents a method to automatically annotate a corpus of German with chunks, topological fields and clause boundaries, and improve tagging accuracy at the same time in order to increase the overall annotation accuracy. Tag improvement primarily relies on the linguistic knowledge encoded in the grammar for annotating the topological fields.
Parsing in Two Frameworks: Finite-State and Functional Dependency Grammar
, 1999
"... the novel non-determistic tokenisation method which was first presented in Tapanainen (1995) and Chanod and Tapanainen (1996a), the formalism for presenting multiword units which was first presented in Tapanainen (1995) and Segond and Tapanainen (1995), the combination of the tokenisation, multiword ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
the novel non-determistic tokenisation method which was first presented in Tapanainen (1995) and Chanod and Tapanainen (1996a), the formalism for presenting multiword units which was first presented in Tapanainen (1995) and Segond and Tapanainen (1995), the combination of the tokenisation, multiword unit recognition, lexical analysis and syntactic analysis, and the syntactic disambiguation engine which is similar to that in Tapanainen (1997)
Towards Learning a Constraint Grammar from Annotated Corpora Using Decision Trees
, 1995
"... Inside the framework of robust parsers for the syntactic analysis of unrestricted text, the aim of this work is the construction of a system capable of automatically learning Constraint Grammar rules from a POS annotated Corpus. The system presented is able by now to acquire constraint rules for ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Inside the framework of robust parsers for the syntactic analysis of unrestricted text, the aim of this work is the construction of a system capable of automatically learning Constraint Grammar rules from a POS annotated Corpus. The system presented is able by now to acquire constraint rules for POS tagging and we plan to extend it to cover syntactic rules. The learning process uses a supervised learning algorithm based on building a discrimination forest, with a decision tree attached to each case of POS ambiguity. The system has been applied to four representative cases of ambiguity performing on a Spanish Corpus. The results obtained in these experiments and some discussion about the appropriateness of the proposed learning technique are presented in this paper. This research has been partially funded by the Spanish Research Department (CICYT) and inscribed as TIC92-0671 1 1 Introduction The task of developing automatic procedures for parsing unrestricted natural langua...
Hybrid POS tagging with generalized unknown-word handling
, 1997
"... This paper presents POSTAG 1 as a statistical/rulebased hybrid part-of-speech (POS) tagging system with generalized unknown-word handling. The POSTAG integrates morphological analysis with statistical POS disambiguation and post rule-based error-correction. The error-correction rules are automatic ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
This paper presents POSTAG 1 as a statistical/rulebased hybrid part-of-speech (POS) tagging system with generalized unknown-word handling. The POSTAG integrates morphological analysis with statistical POS disambiguation and post rule-based error-correction. The error-correction rules are automatically learned from a tagged corpus and selectively correct standard HMM tagging errors. The morphological analysis is tightly coupled with the generalized unknown-word handling which uses a morpheme-pattern dictionary that encodes general lexical patterns of Korean morphemes. In this way, we can guess the POS's of unknown-words regardless of their numbers and positions in an eojeol. Experiments demonstrate the effectiveness of our POS tagging methods with generalized unknown-word handling, and the POSTAG will be especially suited to web indexing where most of the indexing terms are unknown words. Keywords: POS tagging, morphological analysis, unknown-word handling, HMM tagging, rule-based ta...
Syllable pattern-based unknown morpheme estimation for hybrid part-of-speech tagging of Korean
- Computational Linguistics
, 1999
"... This paper presents a syllable pattern directed generalized unknown morpheme handling method with POSTAG (POStech TAGger), ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This paper presents a syllable pattern directed generalized unknown morpheme handling method with POSTAG (POStech TAGger),

