Results 1  10
of
864,330
Maximum entropy markov models for information extraction and segmentation
, 2000
"... Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many textrelated tasks, such as partofspeech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial ..."
Abstract

Cited by 554 (18 self)
 Add to MetaCart
, capitalization, formatting, partofspeech), and defines the conditional probability of state sequences given observation sequences. It does this by using the maximum entropy framework to fit a set of exponential models that represent the probability of a state given an observation and the previous state. We
A Maximum Entropy approach to Natural Language Processing
 COMPUTATIONAL LINGUISTICS
, 1996
"... The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we des ..."
Abstract

Cited by 1341 (5 self)
 Add to MetaCart
describe a method for statistical modeling based on maximum entropy. We present a maximumlikelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
A MaximumEntropyInspired Parser
, 1999
"... We present a new parser for parsing down to Penn treebank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan dard" se ..."
Abstract

Cited by 963 (19 self)
 Add to MetaCart
" sections of the Wall Street Journal tree bank. This represents a 13% decrease in error rate over the best singleparser results on this corpus [9]. The major technical innova tion is the use of a "maximumentropyinspired" model for conditioning and smoothing that let us successfully to test
Integrating classification and association rule mining
 In Proc of KDD
, 1998
"... Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of di ..."
Abstract

Cited by 561 (21 self)
 Add to MetaCart
Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target
A Maximum Entropy Model for PartOfSpeech Tagging
, 1996
"... This paper presents a statistical model which trains from a corpus annotated with PartOfSpeech tags and assigns them to previously unseen text with stateoftheart accuracy(96.6%). The model can be classified as a Maximum Entropy model and simultaneously uses many contextual "features" t ..."
Abstract

Cited by 577 (1 self)
 Add to MetaCart
This paper presents a statistical model which trains from a corpus annotated with PartOfSpeech tags and assigns them to previously unseen text with stateoftheart accuracy(96.6%). The model can be classified as a Maximum Entropy model and simultaneously uses many contextual "
Discriminative Training and Maximum Entropy Models for Statistical Machine Translation
, 2002
"... We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language senten ..."
Abstract

Cited by 497 (30 self)
 Add to MetaCart
We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language
Very simple classification rules perform well on most commonly used datasets
 Machine Learning
, 1993
"... The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest to the machin ..."
Abstract

Cited by 542 (5 self)
 Add to MetaCart
The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest
Using Maximum Entropy for Text Classification
, 1999
"... This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, partofspeech tagging, and text segmentation. The underlying principl ..."
Abstract

Cited by 320 (6 self)
 Add to MetaCart
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, partofspeech tagging, and text segmentation. The underlying
A comparison of event models for Naive Bayes text classification
, 1998
"... Recent work in text classification has used two different firstorder probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multivariate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey ..."
Abstract

Cited by 1002 (27 self)
 Add to MetaCart
Recent work in text classification has used two different firstorder probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multivariate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
, 2002
"... We describe new algorithms for training tagging models, as an alternative to maximumentropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modific ..."
Abstract

Cited by 641 (16 self)
 Add to MetaCart
We describe new algorithms for training tagging models, as an alternative to maximumentropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a
Results 1  10
of
864,330