Results 1 -
4 of
4
Improving Accuracy in Wordclass Tagging through Combination of Machine Learning Systems
- Computational Linguistics
, 2000
"... this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
this paper, we combine different systems employing known representations. The observation that suggests this approach is that systems that are designed differently, either because they use a different formalism or because they contain different knowledge, will typically produce different errors. We hope to make use of this fact and reduce the number of errors with very little additional effort by exploiting the disagreement between different language models. Al- though the approach is applicable to any type of language model, we focus on the case of statistical disambiguators that are trained on annotated corpora. The examples of the task that are present in the corpus and its annotation are fed into a learning algorithm, which induces a model of the desired input-output mapping in the form of a classifier. * EO. Box 9103, 6500 HD Nijmegen, The Netherlands, hvh@let.ktm.nl t Universiteitsplein 1, 2610 Wilrijk, Belgium, {zavrel, daelem}@uia.ua.ac.be () 2000 Association for Computational Linguistics We use a number of different learning algorithms simultaneously on the same training corpus. Each type of learning method brings its own 'inductive bias' to the task and will produce a classifier with slightly different characteristics, so that different methods will tend to produce different errors
2002. Corpus-based Acquisition of Collocational Prepositional Phrases
- CLIN, Selected Papers from the Twelfth CLIN Meeting
, 2002
"... Collocational prepositional phrases like ten koste van (at the expense of), met het oog op (with an eye on), and onder het mom van (under the pretext of) are patterns of the form P-NP-P, which have a non-compositional semantics and which are syntactically rigid or idiosyncratic. We present a number ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Collocational prepositional phrases like ten koste van (at the expense of), met het oog op (with an eye on), and onder het mom van (under the pretext of) are patterns of the form P-NP-P, which have a non-compositional semantics and which are syntactically rigid or idiosyncratic. We present a number of linguistic tests which set such items apart from regularly built prepositional phrases. To find candidate strings which should be included in a computational lexicon as collocational prepositional phrases, we extract all instances of the relevant pattern from a corpus annotated with POS tags. Next, we introduce a number of statistical tests (mutual information, log-likelihood, and ¢ ¡ ) to find those instances which behave like strong collocations. The strongest collocations according to the statistical tests are compared with lists of such items presented elsewhere, and were evaluated by human judges. 1
Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers
- IN PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC-2000
, 2000
"... This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that COMBI-BOOTSTRAP: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.
ANNO: a Multi-functional Flemish Text Corpus
"... In this paper the ANNO Project ("Een Geannoteerde Publieke Gegevensbank voor het Geschreven Nederlands/An Annotated Database for Written Dutch") is reported on 1 . The project aims at laying the foundations for the compilation and linguistic annotation of a large multi-functional Flemish text corp ..."
Abstract
- Add to MetaCart
In this paper the ANNO Project ("Een Geannoteerde Publieke Gegevensbank voor het Geschreven Nederlands/An Annotated Database for Written Dutch") is reported on 1 . The project aims at laying the foundations for the compilation and linguistic annotation of a large multi-functional Flemish text corpus. The corpus available now consists of language written to be spoken, together with transcribed interviews. In this paper we present the levels of annotation ANNO comes with at the moment. In general, we will show what can be achieved using taggers, parsers etc. that are currently available for Dutch. A separate issue is whether the tools are as useful for Flemish as they are for Dutch. Introduction The ANNO Project is sponsored by the Flemish Research Initiative in Speech and Language Technology. It is a pilot project, aiming at laying the foundations for the compilation and linguistic annotation of a large, multi-functional, standard Flemish text corpus. Although great efforts have been...

