Results 1 -
5 of
5
Improving data driven wordclass tagging by system combination
, 1998
"... In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi-vidua | system. We do this by means of an ex-periment involving the task of morpho-syntactic wordclass tagging. ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi-vidua | system. We do this by means of an ex-periment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger gen-erators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy)
Improving Accuracy in Wordclass Tagging through Combination of Machine Learning Systems
- Computational Linguistics
, 2000
"... this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We hope to make use of this fact and reduce the number of errors with very little additional effort by exploiting the disagreement between different language models. Al- though the approach is applicable to any type of language model, we focus on the case of statistical disambiguators that are trained on annotated corpora. The examples of the task that are present in the corpus and its annotation are fed into a learning algorithm, which induces a model of the desired input-output mapping in the form of a classifier. * EO. Box 9103, 6500 HD Nijmegen, The Netherlands, hvh@let.ktm.nl t Universiteitsplein 1, 2610 Wilrijk, Belgium, {zavrel, daelem}@uia.ua.ac.be () 2000 Association for Computational Linguistics We use a number of different learning algorithms simultaneously on the same training corpus. Each type of learning method brings its own 'inductive bias' to the task and will produce a classifier with slightly different characteristics, so that different methods will tend to produce different errors
Improving POS Tagging Using Machine-Learning Techniques
- IN PROCEEDINGS OF THE 1999 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA
, 1999
"... In this paper we show how machine learning techniques for constructing and combining several classifiers can be applied to improve the accuracy of an existing English DOS tagger (Mkrquez and Rodriguez, 1997). Additionally, the problem of data sparseness is also addressed by applying a technique of g ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
In this paper we show how machine learning techniques for constructing and combining several classifiers can be applied to improve the accuracy of an existing English DOS tagger (Mkrquez and Rodriguez, 1997). Additionally, the problem of data sparseness is also addressed by applying a technique of generating convex pseudo-data (Breiman, 1998). Experimental results and a comparison to other state-of-theart taggers are reported.
Reranking an N-Gram Supertagger
- In Proceedings of the TAG+ Workshop
, 2002
"... this paper, we investigate an approach to such a choice based on reranking a set of candidate supertags and their confidence scores. RankBoost (Freund et al., 1998) is the boosting algorithm that we use in order to learn to rerank outputs. It also has been used with good effect in reranking outputs ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
this paper, we investigate an approach to such a choice based on reranking a set of candidate supertags and their confidence scores. RankBoost (Freund et al., 1998) is the boosting algorithm that we use in order to learn to rerank outputs. It also has been used with good effect in reranking outputs of a statistical parser (Collins, 2000) and ranking sentence plans (Walker, Rambow and Rogati, 2001). RankBoost may learn to correct biases that are inherent in n-gram modeling which lead to systematic errors in supertagging (cf. (van Halteren, 1996)). RankBoost can also use a variety of local and long distance features more easily than n-gram-based approaches (cf. (Chen, Bangalore and Vijay-Shanker, 1999)) because it makes sparse data less of an issue
Proceedings ACL-COLING 1998, Montreal, Canada, 491-497, 1998
, 1998
"... In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Fou ..."
Abstract
- Add to MetaCart
In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best individual tagger.

