## An Alternating Direction Method for Dual MAP LP Relaxation

Citations: | 10 - 1 self |

### BibTeX

@MISC{Meshi_analternating,

author = {Ofer Meshi and Amir Globerson},

title = {An Alternating Direction Method for Dual MAP LP Relaxation},

year = {}

}

### OpenURL

### Abstract

Abstract. Maximum a-posteriori (MAP) estimation is an important task in many applications of probabilistic graphical models. Although finding an exact solution is generally intractable, approximations based on linear programming (LP) relaxation often provide good approximate solutions. In this paper we present an algorithm for solving the LP relaxation optimization problem. In order to overcome the lack of strict convexity, we apply an augmented Lagrangian method to the dual LP. The algorithm, based on the alternating direction method of multipliers (ADMM), is guaranteed to converge to the global optimum of the LP relaxation objective. Our experimental results show that this algorithm is competitive with other state-of-the-art algorithms for approximate MAP estimation.

### Citations

615 |
Parallel and Distributed computation: Numerical Methods
- BERTSEKAS, TSKSIKLIS
- 1989
(Show Context)
Citation Context ...lgorithm for projecting onto the local polytope. More recently, Martins et al. [17] proposed a globally convergent algorithm for MAP-LP based on the alternating direction method of multipliers (ADMM) =-=[8, 5, 4, 2]-=-. This method proceeds by iteratively updating primal and dual variables in order to find a saddle point of an augmented Lagrangian for the problem. They suggest to use an augmented Lagrangian of the ... |

427 | Graphical models, exponential families, and variational inference
- Wainwright, Jordan
- 2003
(Show Context)
Citation Context ...), termed factors. The factors depend only on (small) subsets of the variables (Xc ⊆ X ) and model the direct interactions between them (to simplify notation we drop the variable name in Xc = xc; see =-=[27]-=-). The joint distribution is then given by: P (x) ∝ exp (∑ i θi(xi) + ∑ c∈C θc(xc) ) , where we have included also singleton factors over individual variables [27]. In many applications of MRFs we are... |

296 | Convergent tree-reweighted message passing for energy minimization - Kolmogorov |

251 | Smooth minimization of nonsmooth functions
- Nesterov
(Show Context)
Citation Context ...on method have been proposed [9]. These faster algorithms are based on linearization and come with improved convergence rate of O( 1 ɛ ), achieving the theoretical lower bound for first-order methods =-=[19]-=-. In this paper we focus on the basic ADMM formulation and leave derivation of accelerated variants to future work.An Alternating Direction Method for Dual MAP LP Relaxation 5 4 The Augmented Dual LP... |

165 | On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators
- Eckstein, Bertsekas
- 1992
(Show Context)
Citation Context ...lgorithm for projecting onto the local polytope. More recently, Martins et al. [17] proposed a globally convergent algorithm for MAP-LP based on the alternating direction method of multipliers (ADMM) =-=[8, 5, 4, 2]-=-. This method proceeds by iteratively updating primal and dual variables in order to find a saddle point of an augmented Lagrangian for the problem. They suggest to use an augmented Lagrangian of the ... |

95 | A linear programming approach to max-sum problem: A review
- Werner
- 2007
(Show Context)
Citation Context ...or large models since the resulting LPs have too many constraints and variables [29]. This has led researchers to seek optimization algorithms that are tailored to the specific structure of the MAPLP =-=[7, 13, 14, 16, 20, 28]-=-. The advantage of such methods is that they work with very simple local updates and are therefore easy to implement in the large scale setting. The suggested algorithms fall into several classes, dep... |

89 |
A dual algorithm for the solution of nonlinear variational problems via finite element approximation
- Gabay, Mercier
- 1976
(Show Context)
Citation Context ...lgorithm for projecting onto the local polytope. More recently, Martins et al. [17] proposed a globally convergent algorithm for MAP-LP based on the alternating direction method of multipliers (ADMM) =-=[8, 5, 4, 2]-=-. This method proceeds by iteratively updating primal and dual variables in order to find a saddle point of an augmented Lagrangian for the problem. They suggest to use an augmented Lagrangian of the ... |

74 | Fixing maxproduct: Convergent message passing algorithms for MAP LP-relaxations
- Globerson, Jaakkola
- 2007
(Show Context)
Citation Context ...or large models since the resulting LPs have too many constraints and variables [29]. This has led researchers to seek optimization algorithms that are tailored to the specific structure of the MAPLP =-=[7, 13, 14, 16, 20, 28]-=-. The advantage of such methods is that they work with very simple local updates and are therefore easy to implement in the large scale setting. The suggested algorithms fall into several classes, dep... |

65 | Tightening LP relaxations for MAP using message passing
- Sontag, Meltzer, et al.
- 2008
(Show Context)
Citation Context ...m which cannot be solved exactly for many problems of interest. It has turned out that linear programming (LP) relaxations provide effective approximations to the MAP problem in many cases (e.g., see =-=[15, 21, 24]-=-). Despite the theoretical computational tractability of MAP-LP relaxations, solving them in practice is a challenge for real world problems. Using off-theshelf LP solvers is typically inadequate for ... |

64 |
Efficient projections onto the L1-ball for learning in high dimensions
- Duchi, Shalev-Shwartz, et al.
- 2008
(Show Context)
Citation Context ...t some threshold t (i.e., wi = min{vi, t}) such that the sum of removed parts equals d > 0 (i.e., ∑ i vi − wi = d). This can be carried out efficiently in linear time (in expectation) by partitioning =-=[3]-=-. Notice that all updates can be computed efficiently so the cost of each iteration is similar to that of message passing algorithms like MPLP [7] or MSD [28], and to that of dual decomposition [13, 1... |

63 | Sur l’approximation par éléments finis d’ordre un, et la résolution par penalisation-dualité, d’une clase des problémes de Dirichlet non linéaires - Glowinski, Marroco - 1975 |

57 | Linear programming relaxations and belief propagation - an empirical study
- Yanover, Meltzer, et al.
(Show Context)
Citation Context ...solving them in practice is a challenge for real world problems. Using off-theshelf LP solvers is typically inadequate for large models since the resulting LPs have too many constraints and variables =-=[29]-=-. This has led researchers to seek optimization algorithms that are tailored to the specific structure of the MAPLP [7, 13, 14, 16, 20, 28]. The advantage of such methods is that they work with very s... |

48 | On dual decomposition and linear programming relaxations for natural language processing
- Rush, Sontag, et al.
(Show Context)
Citation Context ...m which cannot be solved exactly for many problems of interest. It has turned out that linear programming (LP) relaxations provide effective approximations to the MAP problem in many cases (e.g., see =-=[15, 21, 24]-=-). Despite the theoretical computational tractability of MAP-LP relaxations, solving them in practice is a challenge for real world problems. Using off-theshelf LP solvers is typically inadequate for ... |

45 | Fast image recovery using variable splitting and constrained optimization
- Afonso, Bioucas-Dias, et al.
- 2010
(Show Context)
Citation Context ...r rather mild conditions [2]. However, in terms of convergence rate, the worst case complexity of ADMM is O( 1 ɛ2 ). Despite this potential caveat, ADMM has been shown to work well in practice (e.g., =-=[1, 26]-=-). Recently, accelerated variants on the basic alternating direction method have been proposed [9]. These faster algorithms are based on linearization and come with improved convergence rate of O( 1 ɛ... |

37 |
margin Markov networks
- Max–
(Show Context)
Citation Context ...P estimation. In this paper, we assumed that the model parameters were given. However, in many cases one wishes to learn these from data, for example by minimizing a prediction loss (e.g., hinge loss =-=[25]-=-). We have recently shown how to incorporate dual relaxation algorithms into such learning problems [18]. It will be interesting to apply our ADMM approach in this setting to yield an efficient learni... |

35 |
Beyond Loose LP-Relaxations: Optimizing MRFs by Repairing Cycles
- Komodakis, Paragios
- 2008
(Show Context)
Citation Context ...m which cannot be solved exactly for many problems of interest. It has turned out that linear programming (LP) relaxations provide effective approximations to the MAP problem in many cases (e.g., see =-=[15, 21, 24]-=-). Despite the theoretical computational tractability of MAP-LP relaxations, solving them in practice is a challenge for real world problems. Using off-theshelf LP solvers is typically inadequate for ... |

35 |
Syntactic analysis of twodimensional visual signals in noisy conditions (in Russian
- Schlesinger
- 1976
(Show Context)
Citation Context ...ch takes the form: ( ∑ min θi(xi) + δ ∑ ) δci(xi) + ∑ ( θc(xc) − ∑ ) δci(xi) i max xi c:i∈c c max xc i:i∈c (2) where δ are dual variables corresponding to the marginalization constraints in L(G) (see =-=[22, 28, 23]-=-). 1 This formulation offers several advantages. First, it minimizes an upper bound on the true MAP value. Second, it provides an optimality certificate through the duality gap w.r.t. a decoded primal... |

33 | MRF Energy Minimization and Beyond via Dual Decomposition
- Komodakis, Paragios, et al.
(Show Context)
Citation Context ...or large models since the resulting LPs have too many constraints and variables [29]. This has led researchers to seek optimization algorithms that are tailored to the specific structure of the MAPLP =-=[7, 13, 14, 16, 20, 28]-=-. The advantage of such methods is that they work with very simple local updates and are therefore easy to implement in the large scale setting. The suggested algorithms fall into several classes, dep... |

30 | Message-passing for graph-structured linear programs: Proximal methods and rounding schemes
- Ravikumar, Agarwal, et al.
(Show Context)
Citation Context |

25 | Introduction to dual decomposition for inference
- Sontag, Globerson, et al.
- 2011
(Show Context)
Citation Context ...ch takes the form: ( ∑ min θi(xi) + δ ∑ ) δci(xi) + ∑ ( θc(xc) − ∑ ) δci(xi) i max xi c:i∈c c max xc i:i∈c (2) where δ are dual variables corresponding to the marginalization constraints in L(G) (see =-=[22, 28, 23]-=-). 1 This formulation offers several advantages. First, it minimizes an upper bound on the true MAP value. Second, it provides an optimality certificate through the duality gap w.r.t. a decoded primal... |

21 | Norm-product belief propagation: Primal-dual message-passing for approximate inference
- Hazan, Shashua
- 2010
(Show Context)
Citation Context ...tuck in suboptimal points under these conditions. One way to avoid this problem is to use a soft-max function which is smooth and strictly convex, hence this results in globally convergent algorithms =-=[6, 10, 12]-=-. Another class of algorithms [13, 16] uses the same dual objective, but employs variants of subgradient descent to it. While these methods are guaranteed to converge globally, they are typically slow... |

19 |
Fast alternating linearization methods for minimizing the sum of two convex functions
- Goldfarb, Ma, et al.
- 2010
(Show Context)
Citation Context ... is O( 1 ɛ2 ). Despite this potential caveat, ADMM has been shown to work well in practice (e.g., [1, 26]). Recently, accelerated variants on the basic alternating direction method have been proposed =-=[9]-=-. These faster algorithms are based on linearization and come with improved convergence rate of O( 1 ɛ ), achieving the theoretical lower bound for first-order methods [19]. In this paper we focus on ... |

17 |
Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities
- He, Wang
- 2000
(Show Context)
Citation Context ...e, however, that for this scheme to work well, the Lagrange multipliers γ and µ should be also initialized accordingly. Another potential improvement is to use an adaptive penalty parameter ρt (e.g., =-=[11]-=-). This may improve convergence in practice, as well as reduce sensitivity to the initial choice of ρ. On the downside, the theoretical convergence guarantees of ADMM no longer hold in this case. Mart... |

16 | An augmented Lagrangian relaxation for analytical target cascading using the alternating direction method of multipliers
- Tosserams, Etman, et al.
- 2006
(Show Context)
Citation Context ...r rather mild conditions [2]. However, in terms of convergence rate, the worst case complexity of ADMM is O( 1 ɛ2 ). Despite this potential caveat, ADMM has been shown to work well in practice (e.g., =-=[1, 26]-=-). Recently, accelerated variants on the basic alternating direction method have been proposed [9]. These faster algorithms are based on linearization and come with improved convergence rate of O( 1 ɛ... |

14 | Softmax-margin crfs: Training log-linear models with loss functions
- Gimpel, Smith
- 2010
(Show Context)
Citation Context ...tuck in suboptimal points under these conditions. One way to avoid this problem is to use a soft-max function which is smooth and strictly convex, hence this results in globally convergent algorithms =-=[6, 10, 12]-=-. Another class of algorithms [13, 16] uses the same dual objective, but employs variants of subgradient descent to it. While these methods are guaranteed to converge globally, they are typically slow... |

13 | Learning Efficiently with Approximate Inference via Dual Losses
- Meshi, Sontag, et al.
- 2010
(Show Context)
Citation Context ... wishes to learn these from data, for example by minimizing a prediction loss (e.g., hinge loss [25]). We have recently shown how to incorporate dual relaxation algorithms into such learning problems =-=[18]-=-. It will be interesting to apply our ADMM approach in this setting to yield an efficient learning algorithm for structured prediction problems. Acknowledgments. We thank Ami Wiesel and Elad Eban for ... |

10 | An augmented Lagrangian approach to constrained MAP inference
- Martins, Aguiar, et al.
(Show Context)
Citation Context ...lso globally convergent, it has the disadvantage of using a double loop scheme where every update involves an iterative algorithm for projecting onto the local polytope. More recently, Martins et al. =-=[17]-=- proposed a globally convergent algorithm for MAP-LP based on the alternating direction method of multipliers (ADMM) [8, 5, 4, 2]. This method proceeds by iteratively updating primal and dual variable... |

8 |
Fast and smooth: Accelerated dual decomposition for MAP inference
- Jojic, Gould, et al.
- 2010
(Show Context)
Citation Context |

7 | Convex relaxation methods for graphical models: Lagrangian and maximum entropy approaches
- Johnson
- 2008
(Show Context)
Citation Context ...tuck in suboptimal points under these conditions. One way to avoid this problem is to use a soft-max function which is smooth and strictly convex, hence this results in globally convergent algorithms =-=[6, 10, 12]-=-. Another class of algorithms [13, 16] uses the same dual objective, but employs variants of subgradient descent to it. While these methods are guaranteed to converge globally, they are typically slow... |