Abstract:
We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We give experimental results on part-of-speech tagging and base noun phrase chunking, in both cases showing improvements over results for a maximum-entropy tagger.
Citations
|
1196
|
Building a large annotated corpus of English: the penn treebank
– Marcus, Marcinkiewicz, et al.
- 1993
|
|
848
|
Conditional random fields: Probabilistic models for segmenting and labeling sequence data
– Lafferty, McCallum, et al.
- 2001
|
|
566
|
Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging
– Brill
- 1995
|
|
267
|
Text chunking using transformation-based learning
– Ramshaw, Marcus
- 1995
|
|
259
|
Maximum entropy markov models for information extraction and segmentation
– McCallum, Freitag, et al.
- 2000
|
|
239
|
A maximum entropy part-of-speech tagger
– Ratnaparkhi
- 1996
|
|
204
|
Large margin classification using the perceptron algorithm
– Freund, Schapire
- 1999
|
|
131
|
Convolution kernels for natural language
– Collins, Duffy
- 2001
|
|
108
|
New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron
– Collins, Duffy
- 2002
|
|
44
|
On weak learning
– Helmbold, Warmuth
- 1995
|
|
14
|
Conditional random Probabilistic models for segmenting and labeling sequence data
– Laerty, McCallum, et al.
- 2001
|
|
9
|
Ranking Algorithms for Named Entity Extraction: Boosting and the Voted Perceptron
– Collins
|
|
2
|
Large margin classication using the Perceptron algorithm
– Freund, Schapire
- 1999
|