## Introduction to Dual Decomposition for Inference

### Cached

### Download Links

Citations: | 26 - 5 self |

### BibTeX

@MISC{Sontag_introductionto,

author = {David Sontag and Amir Globerson and Tommi Jaakola},

title = { Introduction to Dual Decomposition for Inference},

year = {}

}

### OpenURL

### Abstract

### Citations

744 |
Nonlinear Programming. Athena Scientific
- Bertsekas
- 1995
(Show Context)
Citation Context ... issues further in Section 1.7. In the incremental subgradient method, at each iteration one computes the subgradient using only some of the subproblems, F ′ ⊂ F , rather than using all factors in F (=-=Bertsekas, 1995-=-). This can significantly decrease the overall running time, and is also more similar to the block coordinate descent methods that we describe next, which make updates with respect to only one factor ... |

373 |
N.: Probabilistic Graphical Models: Principles and Techniques
- Koller, Friedman
- 2009
(Show Context)
Citation Context ...anning, fault diagnosis, or searching for molecular conformations. In addition, a wealth of combinatorial problems arise directly from probabilistic modeling (graphical models). Graphical models (see =-=Koller and Friedman, 2009-=-, for a textbook introduction) have been widely adopted in areas such as computational biology, machine vision, and natural language processing, and are increasingly being used as a framework for expr... |

273 | Minimization Methods for Nondifferentiable Functions, in - Shor - 1985 |

271 | Non-projective dependency parsing using spanning tree algorithms
- McDonald, Pereira, et al.
- 2005
(Show Context)
Citation Context ... functions. These modifications are provided by the Lagrange multipliers associated with agreement constraints. Our second example is dependency parsing, a key problem in natural language processing (=-=McDonald et al., 2005-=-). Given a sentence, we wish to predict the dependency tree that relates the words in the sentence. A dependency tree is a directed tree over the words in the sentence where an arc is drawn from the h... |

225 |
The Lagrangean relaxation method for solving integer programming problems," Management Science 27
- Fisher
- 1981
(Show Context)
Citation Context ...ations for each factor, each of which we assume can be done efficiently. To remove these “complicating” constraints, we use the technique of Lagrangian relaxation (Geoffrion, 1974; Schlesinger, 1976; =-=Fisher, 1981-=-; Lemaréchal, 2001; Guignard, 2003). First, introduce Lagrange multipliers δ = {δfi(xi) : f ∈ F, i ∈ f, xi}, and define the Lagrangian: L(δ, x, x F ) = ∑ θi(xi) + ∑ i∈V f∈F θf (x f f ) + ∑ ∑ ∑ ( δfi(ˆ... |

132 | Map estimation via agreement on trees: messagepassing and linear programming - Wainwright, Jaakkola, et al. |

122 |
Lagrangian relaxation for integer programming
- Geoffrion
- 1974
(Show Context)
Citation Context ...y decompose into independent maximizations for each factor, each of which we assume can be done efficiently. To remove these “complicating” constraints, we use the technique of Lagrangian relaxation (=-=Geoffrion, 1974-=-; Schlesinger, 1976; Fisher, 1981; Lemaréchal, 2001; Guignard, 2003). First, introduce Lagrange multipliers δ = {δfi(xi) : f ∈ F, i ∈ f, xi}, and define the Lagrangian: L(δ, x, x F ) = ∑ θi(xi) + ∑ i∈... |

102 | Tree-based reparameterization framework for analysis of sum-product and related algorithms - Wainwright, Jaakkola, et al. |

77 | Fixing max-product: convergent message passing algorithms for MAP LP-relaxations
- Globerson, Jaakkola
- 2008
(Show Context)
Citation Context ...cantly larger block of coordinates than that of Section 1.5.1. The Max Product Linear Programming (MPLP) algorithm was introduced as a coordinate descent algorithm for LP relaxations of MAP problems (=-=Globerson and Jaakkola, 2008-=-). Here we show that it can also be interpreted as a block coordinate descent algorithm for Eq. 1.2. Assume we fix all the variables δ except δfi(xi) for a specific f and all i (note this differs from... |

73 | Auction Algorithms for Network Flow Problems: A - BERTSEKAS |

66 | Tightening LP relaxations for MAP using message passing
- Sontag, Meltzer, et al.
- 2008
(Show Context)
Citation Context ... et al., 2007).12 Introduction to Dual Decomposition for Inference frequently found using dual decomposition, in spite of the corresponding optimization problems being NP-complete (Koo et al., 2010; =-=Sontag et al., 2008-=-; Yanover et al., 2006). We show in Section 1.6 that Eq. 1.2 is the dual of an LP relaxation of the original problem. When the conditions of Theorem 1.1 are satisfied, it means that the LP relaxation ... |

57 | Linear programming relaxations and belief propagation – an empirical study - Yanover, Meltzer, et al. |

51 | On the Optimality of Tree-reweighted Maxproduct Message-passing
- Kolmogorov, Wainwright
- 2007
(Show Context)
Citation Context ...rdinate descent algorithms do not hold (Bertsekas, 1995). Interestingly, for pairwise MRFs with binary variables, the fixed points of the coordinate descent algorithms do correspond to global optima (=-=Kolmogorov and Wainwright, 2005-=-; Globerson and Jaakkola, 2008). One strategy to avoid the above problem is to replace the max function in the objective of Eq. 1.2 with a soft-max function (e.g., see Johnson, 2008; Hazan and Shashua... |

51 | Dual decomposition for parsing with nonprojective head automata
- Koo, Rush, et al.
- 2010
(Show Context)
Citation Context ...on the modifier selections. A natural dual decomposition in this case will be to break the problem into these two manageable components, which are then forced to agree on the arc selection variables (=-=Koo et al., 2010-=-). 1.3 Dual Decomposition and Lagrangian Relaxation The previous section described several problems where we wish to maximize a sum over factors, each defined on some subset of the variables. Here we ... |

49 | Minimizing sparse higher order energy functions of discrete variables
- Rother, Kohli, et al.
- 2009
(Show Context)
Citation Context ... Duchi et al., 2007; Yarkony et al., 2010), Supermodular functions (Komodakis et al., 2010), Cardinality and order constraints (Gupta et al., 2007; Tarlow et al., 2010), Functions with small support (=-=Rother et al., 2009-=-).1.5 Block Coordinate Descent Algorithms 15 Consider, for example, a factor which enforces a cardinality constraint over binary variables: θf (xf ) = 0 if ∑ i∈f xi = L, and θf (xf ) = −∞ otherwise. ... |

47 | High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF - Werner |

45 | T.: MAP estimation, linear programming and belief propagation with convex free energies
- Weiss, Yanover, et al.
- 2007
(Show Context)
Citation Context ... protein side-chain placement problems exact solutions (with certificates of optimality) are 4. Versions of this theorem appear in multiple papers (e.g., see Geoffrion, 1974; Wainwright et al., 2005; =-=Weiss et al., 2007-=-).12 Introduction to Dual Decomposition for Inference frequently found using dual decomposition, in spite of the corresponding optimization problems being NP-complete (Koo et al., 2010; Sontag et al.... |

39 | Using combinatorial optimization within max-product belief propagation
- Duchi, Tarlow, et al.
- 2007
(Show Context)
Citation Context ... are frequently found in inference problems, all of which can be efficiently maximized over: Tree structures (Wainwright et al., 2005; Komodakis et al., 2010), Matchings (Lacoste-Julien et al., 2006; =-=Duchi et al., 2007-=-; Yarkony et al., 2010), Supermodular functions (Komodakis et al., 2010), Cardinality and order constraints (Gupta et al., 2007; Tarlow et al., 2010), Functions with small support (Rother et al., 2009... |

35 | MRF energy minimization and beyond via dual decomposition
- Komodakis, Paragios, et al.
(Show Context)
Citation Context ...empt to factor the problem into more independent subproblems, resulting in a tractable approximation of the original one. Two closely related relaxation schemes are dual decomposition (Johnson, 2008; =-=Komodakis et al., 2010-=-) and linear programming (LP) relaxations (Schlesinger, 1976; Wainwright et al., 2005). Although the approaches use a different derivation of the approximation, they result in equivalent optimization ... |

35 |
Syntactic analysis of twodimensional visual signals in noisy conditions (in Russian
- Schlesinger
- 1976
(Show Context)
Citation Context ...ulting in a tractable approximation of the original one. Two closely related relaxation schemes are dual decomposition (Johnson, 2008; Komodakis et al., 2010) and linear programming (LP) relaxations (=-=Schlesinger, 1976-=-; Wainwright et al., 2005). Although the approaches use a different derivation of the approximation, they result in equivalent optimization problems. Practical uses of MAP relaxations involve models w... |

34 | Word alignment via quadratic assignment
- Lacoste-Julien, Taskar, et al.
- 2006
(Show Context)
Citation Context ...me of the sparse factors that are frequently found in inference problems, all of which can be efficiently maximized over: Tree structures (Wainwright et al., 2005; Komodakis et al., 2010), Matchings (=-=Lacoste-Julien et al., 2006-=-; Duchi et al., 2007; Yarkony et al., 2010), Supermodular functions (Komodakis et al., 2010), Cardinality and order constraints (Gupta et al., 2007; Tarlow et al., 2010), Functions with small support ... |

27 | Approximate primal solutions and rate analysis for dual subgradient methods - Nedic, Ozdaglar - 2009 |

26 | On the complexity of non-projective datadriven dependency parsing
- McDonald, Satta
- 2007
(Show Context)
Citation Context ...d i, expressed by a function θi|(x |i). Finding the maximizing non-projective parse tree in a model that includes such higher order couplings, without additional restrictions, is known to be NP-hard (=-=McDonald and Satta, 2007-=-). We consider here models where θi|(x |i) can be individually maximized by dynamic programming algorithms (e.g., head-automata models) but become challenging as part of the overall dependency tree mo... |

26 | Tree block coordinate descent for map in graphical models - Sontag, Jaakkola |

24 | Efficient inference with cardinality-based clique potentials
- Gupta, Diwan, et al.
- 2007
(Show Context)
Citation Context ..., 2005; Komodakis et al., 2010), Matchings (Lacoste-Julien et al., 2006; Duchi et al., 2007; Yarkony et al., 2010), Supermodular functions (Komodakis et al., 2010), Cardinality and order constraints (=-=Gupta et al., 2007-=-; Tarlow et al., 2010), Functions with small support (Rother et al., 2009).1.5 Block Coordinate Descent Algorithms 15 Consider, for example, a factor which enforces a cardinality constraint over bina... |

23 |
Lagrangean relaxation
- Lemaréchal
- 2001
(Show Context)
Citation Context ...h factor, each of which we assume can be done efficiently. To remove these “complicating” constraints, we use the technique of Lagrangian relaxation (Geoffrion, 1974; Schlesinger, 1976; Fisher, 1981; =-=Lemaréchal, 2001-=-; Guignard, 2003). First, introduce Lagrange multipliers δ = {δfi(xi) : f ∈ F, i ∈ f, xi}, and define the Lagrangian: L(δ, x, x F ) = ∑ θi(xi) + ∑ i∈V f∈F θf (x f f ) + ∑ ∑ ∑ ( δfi(ˆxi) f∈F i∈f ˆxi 1[... |

22 | Minimizing and learning energy functions for side-chain prediction - Yanover, Schueler-Furman, et al. - 2008 |

21 | Norm-product belief propagation: Primal-dual message-passing for approximate inference
- Hazan, Shashua
- 2010
(Show Context)
Citation Context ... Wainwright, 2005; Globerson and Jaakkola, 2008). One strategy to avoid the above problem is to replace the max function in the objective of Eq. 1.2 with a soft-max function (e.g., see Johnson, 2008; =-=Hazan and Shashua, 2010-=-) which is smooth and strictly convex. As a result, coordinate descent converges globally. 9 An alternative approach are the auction algorithms proposed by Bertsekas (1992). However, currently there d... |

19 |
HOP-MAP: Efficient Message Passing with High Order Potentials
- Tarlow, Givoni, et al.
- 2010
(Show Context)
Citation Context ... al., 2010), Matchings (Lacoste-Julien et al., 2006; Duchi et al., 2007; Yarkony et al., 2010), Supermodular functions (Komodakis et al., 2010), Cardinality and order constraints (Gupta et al., 2007; =-=Tarlow et al., 2010-=-), Functions with small support (Rother et al., 2009).1.5 Block Coordinate Descent Algorithms 15 Consider, for example, a factor which enforces a cardinality constraint over binary variables: θf (xf ... |

9 | Two “well-known” properties of subgradient optimization - Anstreicher, Wolsey - 2007 |

9 | A Diffusion Algorithm for Decreasing Energy of Max-sum Labeling Problem - Kovalevsky, Koval - 1975 |

7 | Convex relaxation methods for graphical models: Lagrangian and maximum entropy approaches
- Johnson
- 2008
(Show Context)
Citation Context ...aints in an attempt to factor the problem into more independent subproblems, resulting in a tractable approximation of the original one. Two closely related relaxation schemes are dual decomposition (=-=Johnson, 2008-=-; Komodakis et al., 2010) and linear programming (LP) relaxations (Schlesinger, 1976; Wainwright et al., 2005). Although the approaches use a different derivation of the approximation, they result in ... |

2 | Lagrangean relaxation. TOP: An Official Journal of the Spanish Society of Statistics and - Guignard |

2 |
An application-oriented guide for designing lagrangean dual ascent algorithms
- Guignard, Rosenwein
- 1989
(Show Context)
Citation Context ...solving the optimization problem in Eq. 1.2 is via coordinate descent. Coordinate descent algorithms have a long history of being used to optimize Lagrangian relaxations (e.g., see Erlenkotter, 1978; =-=Guignard and Rosenwein, 1989-=-). Such algorithms work by fixing the values of all dual variables except for a set of variables, and then minimizing the objective as much as possible with respect to that set. The two key design cho... |