Results 1  10
of
34
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract

Cited by 64 (1 self)
 Add to MetaCart
Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the loglinear or maxmargin objective function; the dual in both the loglinear and maxmargin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the maxmargin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for loglinear models only O(log (1/ε)) updates are required. For both the maxmargin and loglinear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Dependency parsing by belief propagation
 In Proceedings of EMNLP
, 2008
"... We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. E ..."
Abstract

Cited by 64 (7 self)
 Add to MetaCart
We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. Even with secondorder features or latent variables, which would make exact parsing considerably slower or NPhard, BP needs only O(n3) time with a small constant factor. Furthermore, such features significantly improve parse accuracy over exact firstorder methods. Incorporating additional features would increase the runtime additively rather than multiplicatively. 1
Dual decomposition for parsing with nonprojective head automata
 In Proc. of EMNLP
, 2010
"... This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standa ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the nonprojective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98 % of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets. 1
An Empirical Study of Semisupervised Structured Conditional Models for Dependency Parsing
"... This paper describes an empirical study of highperformance dependency parsers based on a semisupervised learning approach. We describe an extension of semisupervised structured conditional models (SSSCMs) to the dependency parsing problem, whose framework is originally proposed in (Suzuki and Iso ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
This paper describes an empirical study of highperformance dependency parsers based on a semisupervised learning approach. We describe an extension of semisupervised structured conditional models (SSSCMs) to the dependency parsing problem, whose framework is originally proposed in (Suzuki and Isozaki, 2008). Moreover, we introduce two extensions related to dependency parsing: The first extension is to combine SSSCMs with another semisupervised approach, described in (Koo et al., 2008). The second extension is to apply the approach to secondorder parsing models, such as those described in (Carreras, 2007), using a twostage semisupervised learning approach. We demonstrate the effectiveness of our proposed methods on dependency parsing experiments using two widely used test collections: the Penn Treebank for English, and the Prague Dependency Treebank
Probabilistic models of nonprojective dependency trees
 In Proc. EMNLPCoNLL
, 2007
"... A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) to derive an algorithm that efficiently sums the scores of all nonprojective trees i ..."
Abstract

Cited by 26 (9 self)
 Add to MetaCart
A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) to derive an algorithm that efficiently sums the scores of all nonprojective trees in a sentence, permitting the definition of a conditional loglinear model over trees. While discriminative methods, such as those presented in McDonald et al. (2005b), obtain very high accuracy on standard dependency parsing tasks and can be trained and applied without marginalization, “summing trees ” permits some alternative techniques of interest. Using the summing algorithm, we present competitive experimental results on four nonprojective languages, for maximum conditional likelihood estimation, minimum Bayesrisk parsing, and hidden variable training. 1
On the complexity of nonprojective datadriven dependency parsing
 In Proc. IWPT
, 2007
"... In this paper we investigate several nonprojective parsing algorithms for dependency parsing, providing novel polynomial time solutions under the assumption that each dependency decision is independent of all the others, called here the edgefactored model. We also investigate algorithms for nonpro ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
In this paper we investigate several nonprojective parsing algorithms for dependency parsing, providing novel polynomial time solutions under the assumption that each dependency decision is independent of all the others, called here the edgefactored model. We also investigate algorithms for nonprojective parsing that account for nonlocal information, and present several hardness results. This suggests that it is unlikely that exact nonprojective dependency parsing is tractable for any model richer than the edgefactored model. 1
Hedging structured concepts
 In COLT
, 2010
"... We develop an online algorithm called Component Hedge for learning structured concept classes when the loss of a structured concept sums over its components. Example classes include paths through a graph (composed of edges) and partial permutations (composed of assignments). The algorithm maintains ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
We develop an online algorithm called Component Hedge for learning structured concept classes when the loss of a structured concept sums over its components. Example classes include paths through a graph (composed of edges) and partial permutations (composed of assignments). The algorithm maintains a parameter vector with one nonnegative weight per component, which always lies in the convex hull of the structured concept class. The algorithm predicts by decomposing the current parameter vector into a convex combination of concepts and choosing one of those concepts at random. The parameters are updated by first performing a multiplicative update and then projecting back into the convex hull. We show that Component Hedge has optimal regret bounds for a large variety of structured concept classes. 1
An Augmented Lagrangian Approach to Constrained MAP Inference
"... We propose a new algorithm for approximate MAP inference on factor graphs, by combining augmented Lagrangian optimization with the dual decomposition method. Each slave subproblem is given a quadratic penalty, which pushes toward faster consensus than in previous subgradient approaches. Our algorith ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
We propose a new algorithm for approximate MAP inference on factor graphs, by combining augmented Lagrangian optimization with the dual decomposition method. Each slave subproblem is given a quadratic penalty, which pushes toward faster consensus than in previous subgradient approaches. Our algorithm is provably convergent, parallelizable, and suitable for fine decompositions of the graph. We show how it can efficiently handle problems with (possibly global) structural constraints via simple sort operations. Experiments on synthetic and realworld data show that our approach compares favorably with the stateoftheart. 1.
Bootstrapping FeatureRich Dependency Parsers with Entropic Priors
"... One may need to build a statistical parser for a new language, using only a very small labeled treebank together with raw text. We argue that bootstrapping a parser is most promising when the model uses a rich set of redundant features, as in recent models for scoring dependency parses (McDonald et ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
One may need to build a statistical parser for a new language, using only a very small labeled treebank together with raw text. We argue that bootstrapping a parser is most promising when the model uses a rich set of redundant features, as in recent models for scoring dependency parses (McDonald et al., 2005). Drawing on Abney’s (2004) analysis of the Yarowsky algorithm, we perform bootstrapping by entropy regularization: we maximize a linear combination of conditional likelihood on labeled data and confidence (negative Rényi entropy) on unlabeled data. In initial experiments, this surpassed EM for training a simple featurepoor generative model, and also improved the performance of a featurerich, conditionally estimated model where EM could not easily have been applied. For our models and training sets, more peaked measures of confidence, measured by Rényi entropy, outperformed smoother ones. We discuss how our feature set could be extended with crosslingual or crossdomain features, to incorporate knowledge from parallel or comparable corpora during bootstrapping. 1
Analyzing and Integrating Dependency Parsers
"... There has been a rapid increase in the volume of research on datadriven dependency parsers in the past five years. This increase has been driven by the availability of treebanks in a wide variety of languages—due in large part to the CoNLL shared tasks—as well as the straightforward mechanisms by w ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
There has been a rapid increase in the volume of research on datadriven dependency parsers in the past five years. This increase has been driven by the availability of treebanks in a wide variety of languages—due in large part to the CoNLL shared tasks—as well as the straightforward mechanisms by which dependency theories of syntax can encode complex phenomena in free word order languages. In this article, our aim is to take a step back and analyze the progress that has been made through an analysis of the two predominant paradigms for datadriven dependency parsing, which are often called graphbased and transitionbased dependency parsing. Our analysis covers both theoretical and empirical aspects and sheds light on the kinds of errors each type of parser makes and how they relate to theoretical expectations. Using these observations, we present an integrated system based on a stacking learning framework and show that such a system can learn to overcome the shortcomings of each nonintegrated system. 1.