Abstract:
We present Conditional Random Fields, a framework
for building probabilistic models to segment
and label sequence data. Conditional random
fields offer several advantages over hidden
Markov models and stochastic grammars
for such tasks, including the ability to relax
strong independence assumptions made in those
models. Conditional random fields also avoid
a fundamental limitation of maximum entropy
Markov models (MEMMs) and other discriminative
Markov models based on directed graphical
models, which can be biased towards states
with few successor states. We present iterative
parameter estimation algorithms for conditional
random fields and compare the performance of
the resulting models to HMMs and MEMMs on
synthetic and natural-language data.
Citations
|
1205
|
Schapire, “Decision-theoretic generalization of on-line learning and application to boosting
– Freund, E
- 1997
|
|
628
|
A Maximum Entropy Approach to Natural Language Processing
– Berger, Pietra, et al.
- 1996
|
|
566
|
Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging
– Brill
- 1995
|
|
441
|
Biological sequence analysis—- Probabilistic models of proteins and nucleic acids. Combridge
– Durbin, Eddy, et al.
- 1998
|
|
362
|
Inducing features of random fields
– Pietra, Pietra, et al.
- 1997
|
|
344
|
Foundations of statistical natural language processing
– Manning, Schutze
- 1999
|
|
295
|
Generalized iterative scaling for log-linear models
– Darroch, Ratcliff
- 1972
|
|
273
|
P.: Gradient-based learning applied to document recognition
– LeCun, Bottou, et al.
- 1998
|
|
259
|
Maximum entropy markov models for information extraction and segmentation
– McCallum, Freitag, et al.
- 2000
|
|
241
|
A maximum entropy model for part-of-speech tagging
– Ratnaparkhi
- 1996
|
|
206
|
Finite-state transducers in language and speech processing
– Mohri
- 1997
|
|
145
|
Discriminative re-ranking for natural language parsing
– Collins
|
|
130
|
Learning to resolve natural language ambiguities: a unified approach
– Roth
- 1998
|
|
99
|
Introduction to probabilistic automata
– Paz
- 1971
|
|
69
|
Information extraction with hmm structures learned by stochastic optimization
– Freitag
- 2000
|
|
51
|
Markov field and finite graphs and lattices. unpublished
– Hammersley, Clifford
- 1971
|
|
50
|
Boosting applied to tagging and PP attachment
– Abney, Schapire, et al.
- 1999
|
|
47
|
Boltzmann chains and hidden Markov models
– Saul, Jordan
- 1995
|
|
45
|
Une Approche théorique de l’Apprentissage Connexionniste: Applications à la Reconnaissance de la Parole
– Bottou
- 1991
|
|
42
|
Minimization algorithms for sequential transducers
– Mohri
- 2000
|
|
23
|
A whole sentence maximum entropy language model
– Rosenfeld
- 1997
|
|
3
|
Equivalence of linear Boltzmann chains and hidden Markov models
– MacKay
- 1996
|
|
2
|
The use of classifiers in sequential inference. NIPS 13. Forthcoming
– Punyakanok
- 2001
|