Results 1  10
of
89
Structured learning with approximate inference
 Advances in Neural Information Processing Systems
"... In many structured prediction problems, the highestscoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufficient. We show in particular that learning ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
(Show Context)
In many structured prediction problems, the highestscoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufficient. We show in particular that learning can fail even with an approximate inference method with rigorous approximation guarantees. There are two reasons for this. First, approximate methods can effectively reduce the expressivity of an underlying model by making it impossible to choose parameters that reliably give good predictions. Second, approximations can respond to parameter changes in such a way that standard learning algorithms are misled. In contrast, we give two positive results in the form of learning bounds for the use of LPrelaxed inference in structured perceptron and empirical risk minimization settings. We argue that without understanding combinations of inference and learning, such as these, that are appropriately compatible, learning performance under approximate inference cannot be guaranteed. 1
An Introduction to Conditional Random Fields
 Foundations and Trends in Machine Learning
, 2012
"... ..."
(Show Context)
Discriminatively Trained Particle Filters for Complex MultiObject Tracking
"... This work presents a discriminative training method for particle filters in the context of multiobject tracking. We are motivated by the difficulty of handtuning the many model parameters for such applications and also by results in many application domains indicating that discriminative training ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
This work presents a discriminative training method for particle filters in the context of multiobject tracking. We are motivated by the difficulty of handtuning the many model parameters for such applications and also by results in many application domains indicating that discriminative training is often superior to generative training methods. Our learning approach is tightly integrated into the actual inference process of the filter and attempts to directly optimize the filter parameters in response to observed errors. We present experimental results in the challenging domain of American football where our filter is trained to track all 22 players throughout football plays. The training method is shown to significantly improve performance of the tracker and to significantly outperform two recent particlebased multiobject tracking methods. 1.
Piecewise pseudolikelihood for efficient CRF training
 In International Conference on Machine Learning (ICML
, 2007
"... Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accur ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton & McCallum, 2005) can have better accuracy but does not scale as well in the variable cardinality. In this paper, we introduce piecewise pseudolikelihood, which retains the computational efficiency of pseudolikelihood but can have much better accuracy. On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikelihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training. 1.
Joint Word Segmentation and POS Tagging using a Single Perceptron
"... For Chinese POS tagging, word segmentation is a preliminary step. To avoid error propagation and improve segmentation by utilizing POS information, segmentation and tagging can be performed simultaneously. A challenge for this joint approach is the large combined search space, which makes efficient ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
For Chinese POS tagging, word segmentation is a preliminary step. To avoid error propagation and improve segmentation by utilizing POS information, segmentation and tagging can be performed simultaneously. A challenge for this joint approach is the large combined search space, which makes efficient decoding very hard. Recent research has explored the integration of segmentation and POS tagging, by decoding under restricted versions of the full combined search space. In this paper, we propose a joint segmentation and POS tagging model that does not impose any hard constraints on the interaction between word and POS information. Fast decoding is achieved by using a novel multiplebeam search algorithm. The system uses a discriminative statistical model, trained using the generalized perceptron algorithm. The joint model gives an error reduction in segmentation accuracy of 14.6 % and an error reduction in tagging accuracy of 12.2%, compared to the traditional pipeline approach. 1
Structured Perceptron with Inexact Search In submission. Revision of January 7, 2012.
"... Structured learning with inexact inference is a fundamental problem. We propose variants of the structured perceptron algorithm under a general “violationfixing ” framework that guarantees convergence. This framework subsumes previous remedies including “early update ” as special cases, and also ex ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
Structured learning with inexact inference is a fundamental problem. We propose variants of the structured perceptron algorithm under a general “violationfixing ” framework that guarantees convergence. This framework subsumes previous remedies including “early update ” as special cases, and also explains why standard perceptron may fail with inexact search. We also propose new update methods within this framework which learn better models with dramatically reduced training times on stateoftheart partofspeech tagging and incremental parsing systems. 1
A discriminative model for treetotree translation
 In Proceedings of the EMNLP
, 2006
"... This paper proposes a statistical, treetotree model for producing translations. Two main contributions are as follows: (1) a method for the extraction of syntactic structures with alignment information from a parallel corpus of translations, and (2) use of a discriminative, featurebased model for p ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
This paper proposes a statistical, treetotree model for producing translations. Two main contributions are as follows: (1) a method for the extraction of syntactic structures with alignment information from a parallel corpus of translations, and (2) use of a discriminative, featurebased model for prediction of these targetlanguage syntactic structures—which we call aligned extended projections, or AEPs. An evaluation of the method on translation from German to English shows similar performance to the phrasebased model of Koehn et al. (2003). 1
Discriminative Learning and Spanning Tree Algorithms for Dependency Parsing
, 2006
"... In this thesis we develop a discriminative learning method for dependency parsing using
online largemargin training combined with spanning tree inference algorithms. We will
show that this method provides stateoftheart accuracy, is extensible through the feature
set and can be implemented effici ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
In this thesis we develop a discriminative learning method for dependency parsing using
online largemargin training combined with spanning tree inference algorithms. We will
show that this method provides stateoftheart accuracy, is extensible through the feature
set and can be implemented efficiently. Furthermore, we display the language independent
nature of the method by evaluating it on over a dozen diverse languages as well as show its
practical applicability through integration into a sentence compression system.
We start by presenting an online largemargin learning framework that is a generaliza
tion of the work of Crammer and Singer [34, 37] to structured outputs, such as sequences
and parse trees. This will lead to the heart of this thesis – discriminative dependency pars
ing. Here we will formulate dependency parsing in a spanning tree framework, yielding
efficient parsing algorithms for both projective and nonprojective tree structures. We will
then extend the parsing algorithm to incorporate features over larger substructures with
out an increase in computational complexity for the projective case. Unfortunately, the
nonprojective problem then becomes NPhard so we provide structurally motivated ap
proximate algorithms. Having defined a set of parsing algorithms, we will also define a
rich feature set and train various parsers using the online largemargin learning framework.
We then compare our trained dependency parsers to other stateoftheart parsers on 14
diverse languages: Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, English, German,
Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish.
Having built an efficient and accurate discriminative dependency parser, this thesis will
then turn to improving and applying the parser. First we will show how additional re
sources can provide useful features to increase parsing accuracy and to adapt parsers to
new domains. We will also argue that the robustness of discriminative inferencebased
learning algorithms lend themselves well to dependency parsing when feature representa
tions or structural constraints do not allow for tractable parsing algorithms. Finally, we
integrate our parsing models into a stateoftheart sentence compression system to show
its applicability to a real world problem.
Piecewise Training for Structured Prediction
 MACHINE LEARNING
"... A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pi ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood, designed for when variables have large cardinality. On several realworld NLP data sets, piecewise training performs superior to Besag’s pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to piecewise and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training.
Polyhedral Outer Approximations with Application to Natural Language Parsing
"... Recent approaches to learning structured predictors often require approximate inference for tractability; yet its effects on the learned model are unclear. Meanwhile, most learning algorithms act as if computational cost was constant within the model class. This paper sheds some light on the first i ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
Recent approaches to learning structured predictors often require approximate inference for tractability; yet its effects on the learned model are unclear. Meanwhile, most learning algorithms act as if computational cost was constant within the model class. This paper sheds some light on the first issue by establishing risk bounds for maxmargin learning with LP relaxed inference and addresses the second issue by proposing a new paradigm that attempts to penalize “timeconsuming” hypotheses. Our analysis relies on a geometric characterization of the outer polyhedra associated with the LP relaxation. We then apply these techniques to the problem of dependency parsing, for which a concise LP formulation is provided that handles nonlocal output features. A significant improvement is shown over arcfactored models. 1.