Abstract:
Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.
Citations
|
2372
|
A tutorial on hidden Markov Models and selected applications in speech recognition
– Rabiner
- 1989
|
|
1196
|
Building a large annotated corpus of English: the penn treebank
– Marcus, Marcinkiewicz, et al.
- 1993
|
|
396
|
Class-based n-gram models of natural language
– BROWN, J, et al.
- 1990
|
|
298
|
A Practical Part-of-Speech Tagger
– Cutting, Kupiec, et al.
- 1992
|
|
241
|
A maximum entropy model for part-of-speech tagging
– Ratnaparkhi
- 1996
|
|
145
|
Mbt: A memory-based part of speech tagger generator
– Daelemans, Zavrel, et al.
- 1996
|
|
144
|
Frequency Analysis of English Usage
– Francis
- 1982
|
|
137
|
A corpus-based approach to Language Learning
– Brill
- 1993
|
|
92
|
An annotation scheme for free word order languages
– Skut, Krenn, et al.
- 1997
|
|
88
|
Equations for part-of-speech tagging
– Charniak, Hendrickson, et al.
- 1993
|
|
67
|
English for the Computer
– Sampson
- 1995
|
|
50
|
Improving Data Driven Wordclass Tagging by System Combination
– Halteren, Zavrel, et al.
- 1993
|
|
46
|
Improvements in Part-of-Speech Tagging with an Application to German
– Schmid
- 1995
|
|
27
|
Syntactic Annotation of a German newspaper corpus
– Brants, Skut, et al.
- 1999
|
|
16
|
Morphological tagging based entirely on Bayesian inference
– Samuelsson
- 1993
|
|
4
|
Internal and external tagsets in part-of-speech tagging
– Brants
- 1997
|
|
4
|
Comparing a statistical and a rule-based tagger for German
– Volk, Schneider
- 1998
|