Maximum Entropy Markov Models for Information Extraction and Segmentation (2000) [263 citations — 15 self]
http://www.seas.upenn.edu/~strctlrn/bib/PDF/memm-i
http://www.cs.iastate.edu/~honavar/memm-icml2000.p
http://www.cs.cmu.edu/People/dayne/ps/memm.ps
http://www.ai.mit.edu/events/stair/mccallum1.ps
http://www.cs.cmu.edu/People/mccallum/papers/memm-
http://www.cs.umass.edu/~mccallum/papers/memm-icml
http://www.cs.cmu.edu/afs/cs/user/dayne/www/ps/mem
http://people.csail.mit.edu/regina/6864/slides/mem
http://www.ai.mit.edu/courses/6.891-nlp/READINGS/m
DBLP
CACHED:
Abstract:
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial distributions over a discrete vocabulary, and the HMM parameters are set to maximize the likelihood of the observations. This paper presents a new Markovian sequence model, closely related to HMMs, that allows observations to be represented as arbitrary overlapping features (such as word, capitalization, formatting, part-of-speech), and defines the conditional probability of state sequences given observation sequences. It does this by using the maximum entropy framework to fit a set of exponential models that represent the probability of a state given an observation and the previous state. We present positive experimental results on the segmentation of FAQ's.

