Results 1  10
of
70
Dual decomposition for parsing with nonprojective head automata
 In Proc. of EMNLP
, 2010
"... This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standa ..."
Abstract

Cited by 53 (7 self)
 Add to MetaCart
This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of headautomata models to nonprojective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the nonprojective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98 % of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets. 1
Minimizing Sparse Higher Order Energy Functions of Discrete Variables
"... Higher order energy functions have the ability to encode high level structural dependencies between pixels, which have been shown to be extremely powerful for image labeling problems. Their use, however, is severely hampered in practice by the intractable complexity of representing and minimizing su ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
Higher order energy functions have the ability to encode high level structural dependencies between pixels, which have been shown to be extremely powerful for image labeling problems. Their use, however, is severely hampered in practice by the intractable complexity of representing and minimizing such functions. We observed that higher order functions encountered in computer vision are very often “sparse”, i.e. many labelings of a higher order clique are equally unlikely and hence have the same high cost. In this paper, we address the problem of minimizing such sparse higher order energy functions. Our method works by transforming the problem into an equivalent quadratic function minimization problem. The resulting quadratic function can be minimized using popular message passing or graph cut based algorithms for MAP inference. Although this is primarily a theoretical paper, it also shows how higher order functions can be used to obtain impressive results for the binary texture restoration problem.
On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing
 In Proc. EMNLP
, 2010
"... This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamicprogramming algorithms as oracle solvers for subproblems, together with a simple method for forcing agreement between the different oracles. The approa ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamicprogramming algorithms as oracle solvers for subproblems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram partofspeech tagger. 1
MRF energy minimization and beyond via dual decomposition
 IN: IEEE PAMI. (2011
"... This paper introduces a new rigorous theoretical framework to address discrete MRFbased optimization in computer vision. Such a framework exploits the powerful technique of Dual Decomposition. It is based on a projected subgradient scheme that attempts to solve an MRF optimization problem by first ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
This paper introduces a new rigorous theoretical framework to address discrete MRFbased optimization in computer vision. Such a framework exploits the powerful technique of Dual Decomposition. It is based on a projected subgradient scheme that attempts to solve an MRF optimization problem by first decomposing it into a set of appropriately chosen subproblems and then combining their solutions in a principled way. In order to determine the limits of this method, we analyze the conditions that these subproblems have to satisfy and we demonstrate the extreme generality and flexibility of such an approach. We thus show that, by appropriately choosing what subproblems to use, one can design novel and very powerful MRF optimization algorithms. For instance, in this manner we are able to derive algorithms that: 1) generalize and extend stateoftheart messagepassing methods, 2) optimize very tight LPrelaxations to MRF optimization, 3) and take full advantage of the special structure that may exist in particular MRFs, allowing the use of efficient inference techniques such as, e.g, graphcut based methods. Theoretical analysis on the bounds related with the different algorithms derived from our framework and experimental results/comparisons using synthetic and real data for a variety of tasks in computer vision demonstrate the extreme potentials of our approach.
Learning Bayesian Network Structure using LP Relaxations
"... We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty aris ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming. 1
Exact Decoding of Phrasebased Translation Models through Lagrangian Relaxation
 In To appear proc. of EMNLP
, 2011
"... This paper describes an algorithm for exact decoding of phrasebased translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99 % of test examples. The method is much more efficient than approaches based on linear programming ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
This paper describes an algorithm for exact decoding of phrasebased translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99 % of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
Energy Minimization for Linear Envelope MRFs
"... Markov random fields with higher order potentials have emerged as a powerful model for several problems in computer vision. In order to facilitate their use, we propose a new representation for higher order potentials as upper and lower envelopes of linear functions. Our representation concisely mod ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Markov random fields with higher order potentials have emerged as a powerful model for several problems in computer vision. In order to facilitate their use, we propose a new representation for higher order potentials as upper and lower envelopes of linear functions. Our representation concisely models several commonly used higher order potentials, thereby providing a unified framework for minimizing the corresponding Gibbs energy functions. We exploit this framework by converting lower envelope potentials to standard pairwise functions with the addition of a small number of auxiliary variables. This allows us to minimize energy functions with lower envelope potentials using conventional algorithms such as BP, TRW and αexpansion. Furthermore, we show how the minimization of energy functions with upper envelope potentials leads to a difficult minmax problem. We address this difficulty by proposing a new message passing algorithm that solves a linear programming relaxation of the problem. Although this is primarily a theoretical paper, we demonstrate the efficacy of our approach on the binary (fg/bg) segmentation problem. 1.
Learning Efficiently with Approximate Inference via Dual Losses
"... Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeat ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with stateoftheart methods such as stochastic subgradient and cuttingplane. 1.