MBT: A Memory-Based Part of Speech Tagger-Generator (1996)
Cached
Download Links
- [acl.ldc.upenn.edu]
- [www.aclweb.org]
- [pcger40.uia.ac.be]
- DBLP
Other Repositories/Bibliography
| Venue: | PROC. OF FOURTH WORKSHOP ON VERY LARGE CORPORA |
| Citations: | 168 - 47 self |
BibTeX
@INPROCEEDINGS{Daelemans96mbt:a,
author = {Walter Daelemans and Jakub Zavrel and Peter Berck and Steven Gillis},
title = {MBT: A Memory-Based Part of Speech Tagger-Generator},
booktitle = {PROC. OF FOURTH WORKSHOP ON VERY LARGE CORPORA},
year = {1996},
pages = {14--27},
publisher = {ACL SIGDAT}
}
Years of Citing Articles
OpenURL
Abstract
We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, ad with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.







