Results 1  10
of
234
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
, 2002
"... We describe new algorithms for training tagging models, as an alternative to maximumentropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modific ..."
Abstract

Cited by 517 (14 self)
 Add to MetaCart
We describe new algorithms for training tagging models, as an alternative to maximumentropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We give experimental results on partofspeech tagging and base noun phrase chunking, in both cases showing improvements over results for a maximumentropy tagger.
Shallow Parsing with Conditional Random Fields
, 2003
"... Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluati ..."
Abstract

Cited by 473 (8 self)
 Add to MetaCart
Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base nounphrase chunking method on the CoNLL task, and better than any reported single model. Improved training methods based on modern optimization algorithms were critical in achieving these results. We present extensive comparisons between models and training methods that confirm and strengthen previous results on shallow parsing and training methods for maximumentropy models.
Large margin methods for structured and interdependent output variables
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary ..."
Abstract

Cited by 399 (11 self)
 Add to MetaCart
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the wellknown notion of a separation margin and derive a corresponding maximummargin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.
Kernel Methods for Relation Extraction
, 2002
"... We present an application of kernel methods to extracting relations from unstructured natural language sources. ..."
Abstract

Cited by 158 (0 self)
 Add to MetaCart
We present an application of kernel methods to extracting relations from unstructured natural language sources.
Minimum bayesrisk decoding for statistical machine translation
 In Proceedings of HLTNAACL
, 2004
"... We present Minimum BayesRisk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of l ..."
Abstract

Cited by 139 (15 self)
 Add to MetaCart
We present Minimum BayesRisk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, wordtoword alignments from an MT system, and syntactic structure from parsetrees of source and target language sentences. We report the performance of the MBR decoders on a ChinesetoEnglish translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions. 1
Intricacies of Collins’ parsing model
, 2003
"... This article documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins ’ (1999) thesis, this article contains all information necessary to duplicate Collins ’ benchmark results. Indeed, these asyetunpublished details account for an 11 % relat ..."
Abstract

Cited by 117 (1 self)
 Add to MetaCart
(Show Context)
This article documents a large set of heretofore unpublished details Collins used in his parser, such that, along with Collins ’ (1999) thesis, this article contains all information necessary to duplicate Collins ’ benchmark results. Indeed, these asyetunpublished details account for an 11 % relative increase in error from an implementation including all details to a cleanroom implementation of Collins ’ model. We also show a cleaner and equally wellperforming method for the handling of punctuation and conjunction and reveal certain other probabilistic oddities about Collins ’ parser. We not only analyze the effect of the unpublished details, but also reanalyze the effect of certain wellknown details, revealing that bilexical dependencies are barely used by the model and that head choice is not nearly as important to overall parsing performance as once thought. Finally, we perform experiments that show that the true discriminative power of lexicalization appears to lie in the fact that unlexicalized syntactic structures are generated conditioning on the headword and its part of speech. 1.
Maltparser: A languageindependent system for datadriven dependency parsing
 In Proc. of the Fourth Workshop on Treebanks and Linguistic Theories
, 2005
"... ..."
LogLinear Models for Label Ranking
, 2003
"... Label ranking is the task of inferring a total order over a predefined set of labels for each given instance. We present a general framework for batch learning of label ranking functions from supervised data. We assume that each instance in the training data is associated with a list of preferenc ..."
Abstract

Cited by 86 (5 self)
 Add to MetaCart
Label ranking is the task of inferring a total order over a predefined set of labels for each given instance. We present a general framework for batch learning of label ranking functions from supervised data. We assume that each instance in the training data is associated with a list of preferences over the labelset, however we do not assume that this list is either complete or consistent. This enables us to accommodate a variety of ranking problems. In contrast to the general form of the supervision, our goal is to learn a ranking function that induces a total order over the entire set of labels. Special cases of our setting are multilabel categorization and hierarchical classification. We present a general boostingbased learning algorithm for the label ranking problem and prove a lower bound on the progress of each boosting iteration. The applicability of our approach is demonstrated with a set of experiments on a largescale text corpus.
Ranking Algorithms for NamedEntity Extraction: Boosting and the Voted Perceptron
, 2002
"... We describe algorithms that rerank the top N hypotheses from a maximumentropy tagger, the application being namedentity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms g ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
(Show Context)
We describe algorithms that rerank the top N hypotheses from a maximumentropy tagger, the application being namedentity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximumentropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples.
A study on convolution kernels for shallow semantic parsing
 in Proceedings of ACL’04
, 2004
"... In this paper we have designed and experimented novel convolution kernels for automatic classification of predicate arguments. Their main property is the ability to process structured representations. Support Vector Machines (SVMs), using a combination of such kernels and the flat feature kernel, cl ..."
Abstract

Cited by 73 (9 self)
 Add to MetaCart
(Show Context)
In this paper we have designed and experimented novel convolution kernels for automatic classification of predicate arguments. Their main property is the ability to process structured representations. Support Vector Machines (SVMs), using a combination of such kernels and the flat feature kernel, classify PropBank predicate arguments with accuracy higher than the current argument classification stateoftheart. Additionally, experiments on FrameNet data have shown that SVMs are appealing for the classification of semantic roles even if the proposed kernels do not produce any improvement. 1