## Augmenting Dual Decomposition for MAP Inference

### Cached

### Download Links

### BibTeX

@MISC{Aguiar_augmentingdual,

author = {Pedro M. Q. Aguiar and Mário A. T. Figueiredo},

title = {Augmenting Dual Decomposition for MAP Inference},

year = {}

}

### OpenURL

### Abstract

In this paper, we propose combining augmented Lagrangian optimization with the dual decomposition method to obtain a fast algorithm for approximate MAP (maximum a posteriori) inference on factor graphs. We also show how the proposed algorithm can efficiently handle problems with (possibly global) structural constraints. The experimental results reported testify for the state-of-the-art performance of the proposed approach. 1

### Citations

794 |
Nonlinear Programming. Athena Scientific
- Bertsekas
- 1999
(Show Context)
Citation Context ...uration of a probabilistic graphical model (e.g., a factor graph – FG [27]) is in general an NP-hard problem, a fact which has stimulated much work on approximate methods. The dual decomposition (DD) =-=[3, 4]-=- is one such approximate method, which has been recently used in computer vision [17] and natural language parsing [18]. In a nutshell, DD works by breaking the original hard problem into a set of sma... |

263 |
Decomposition principle for linear programs
- Dantzig, Wolfe
- 1960
(Show Context)
Citation Context ...uration of a probabilistic graphical model (e.g., a factor graph – FG [27]) is in general an NP-hard problem, a fact which has stimulated much work on approximate methods. The dual decomposition (DD) =-=[3, 4]-=- is one such approximate method, which has been recently used in computer vision [17] and natural language parsing [18]. In a nutshell, DD works by breaking the original hard problem into a set of sma... |

230 |
Numerical methods for nonlinear variational problems
- Glowinski
- 1984
(Show Context)
Citation Context ...N(a), φa), thus Alg. 2 approaches Alg. 1. However, we will see that for a proper choice of ηt, Alg. 2 converges faster. In practice, it is common to use a fixed ηt = η and 1 < τ ≤ ( √ 5 + 1)/2 ≃ 1.61 =-=[10]-=-. With these choices, and assuming that QUADηt is computed exactly, convergence is guaranteed, since in (4), both the objective function (which is linear) and the feasible set are convex [10]. Under c... |

194 |
Multiplier and gradient methods
- Hestenes
- 1969
(Show Context)
Citation Context ...lerated methods worthy of study [14]. Here, we propose to ally the strength of DD with the effectiveness of augmented Lagrangian (AL) methods, which have a long and successful history in optimization =-=[13, 21]-=-, and which have recently been shown to be extremely competitive for some large scale problems [1, 7, 12]. Specifically, we use the alternating direction method of multipliers (ADMM) [8, 11, 6] to han... |

186 | On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators
- Eckstein, Bertsekas
- 1992
(Show Context)
Citation Context ...mization [13, 21], and which have recently been shown to be extremely competitive for some large scale problems [1, 7, 12]. Specifically, we use the alternating direction method of multipliers (ADMM) =-=[8, 11, 6]-=- to handle the dual of the constrained problem resulting from the DD, and show that the resulting method has state-of-the-art performance. Being interested in problems with (possibly global) structura... |

106 |
A dual algorithm for the solution of nonlinear variational problems via element approximations
- Gabay, Mercier
- 1976
(Show Context)
Citation Context ...mization [13, 21], and which have recently been shown to be extremely competitive for some large scale problems [1, 7, 12]. Specifically, we use the alternating direction method of multipliers (ADMM) =-=[8, 11, 6]-=- to handle the dual of the constrained problem resulting from the DD, and show that the resulting method has state-of-the-art performance. Being interested in problems with (possibly global) structura... |

84 | Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations
- Globerson, Jaakkola
- 2007
(Show Context)
Citation Context ...n: ∑ OPT ′ � max (µ,ν)∈L(G) i a θ ⊤ i µ i + ∑ φ ⊤ a νa, (3) which will be our main focus throughout. Obviously, OPT ′ ≥ OPT, since L(G) ⊇ M(G). 3 Dual Decomposition Several message passing algorithms =-=[9, 15]-=- are derived via some reformulation of (3) followed by dualization. The DD method [17] reformulates (3) by adding new variables νa i (for each factor a and i ∈ N(a)) that are local “replicas” of the m... |

80 | MRF optimization via dual decomposition: Message-passing revisited
- Komodakis, Paragios, et al.
- 2007
(Show Context)
Citation Context ...al an NP-hard problem, a fact which has stimulated much work on approximate methods. The dual decomposition (DD) [3, 4] is one such approximate method, which has been recently used in computer vision =-=[17]-=- and natural language parsing [18]. In a nutshell, DD works by breaking the original hard problem into a set of smaller (slave) subproblems. This set of subproblems, together with the constraints that... |

77 |
Sur l’approximation par éléments finis d’ordre un, et la résolution par pénalisation–dualité d’une classe de problèmes de Dirichlet non linéaires
- Glowinski, Marroco
- 1975
(Show Context)
Citation Context ...mization [13, 21], and which have recently been shown to be extremely competitive for some large scale problems [1, 7, 12]. Specifically, we use the alternating direction method of multipliers (ADMM) =-=[8, 11, 6]-=- to handle the dual of the constrained problem resulting from the DD, and show that the resulting method has state-of-the-art performance. Being interested in problems with (possibly global) structura... |

70 |
Efficient projections onto the l1-ball for learning in high dimensions
- Duchi, Shalev-Shwartz, et al.
- 2008
(Show Context)
Citation Context ...hat of minimizing ‖z − c‖ 2 subject to z ∈ conv Sa, which is a Euclidean projection onto a polyhedron: • conv S XOR is the probability simplex; the projection can be computed efficiently using a sort =-=[5]-=-. • conv S OR is a hypercube with a vertex removed and conv S OR-OUT is a pyramid whose base is a hypercube with a vertex removed; in both cases, the projections can be efficiently computed using one ... |

67 | 2008. Dependency Parsing by Belief Propagation
- Smith, Eisner
(Show Context)
Citation Context ... factor graph corresponding to a second-order dependency parsing model In English, projective trees are sufficient to anaincreases efficiency withand sibling accuracy andfor grandparent lan- features =-=[25]-=-. Note the TREE hard constraint factor, which enforces the overall lyze most sentence types. In fact, the largest source guages with non-projective variable assignment dependencies. to encode a valid ... |

63 |
Predicting Structured Data
- Bakir, Hofmann, et al.
- 2007
(Show Context)
Citation Context ...resulting from the DD, and show that the resulting method has state-of-the-art performance. Being interested in problems with (possibly global) structural constraints (common in structured prediction =-=[2]-=-), we show that the proposed method can handle this class of problems efficiently. This efficiency is rooted in the fact that the slave problems associated with the hard factors enforcing some of thes... |

61 | Dual decomposition for parsing with non-projective head automata
- Koo, Rush, et al.
- 2010
(Show Context)
Citation Context ...h has stimulated much work on approximate methods. The dual decomposition (DD) [3, 4] is one such approximate method, which has been recently used in computer vision [17] and natural language parsing =-=[18]-=-. In a nutshell, DD works by breaking the original hard problem into a set of smaller (slave) subproblems. This set of subproblems, together with the constraints that they should agree on the variable... |

58 | Beyond pairwise energies: Efficient optimization for higher-order MRFs, in
- Komodakis, Paragios
(Show Context)
Citation Context ...averaged over the 2,399 test sentences. 56 Related Work and Final Remarks The DD method for MAP inference was first proposed for image segmentation using pairwise [17] and higher order factor graphs =-=[16]-=-. It was recently adopted for natural language parsing [24, 18], with only a couple of slave subproblems handled with dynamic programming. Accelerated DD were first considered in [14], where the appro... |

54 | On optimality of tree-reweighted maxproduct message-passing
- Kolmogorov, Wainwright
- 2005
(Show Context)
Citation Context ...n: ∑ OPT ′ � max (µ,ν)∈L(G) i a θ ⊤ i µ i + ∑ φ ⊤ a νa, (3) which will be our main focus throughout. Obviously, OPT ′ ≥ OPT, since L(G) ⊇ M(G). 3 Dual Decomposition Several message passing algorithms =-=[9, 15]-=- are derived via some reformulation of (3) followed by dualization. The DD method [17] reformulates (3) by adding new variables νa i (for each factor a and i ∈ N(a)) that are local “replicas” of the m... |

53 | Fast image recovery using variable splitting and constrained optimization
- Afonso, Bioucas-Dias, et al.
- 2010
(Show Context)
Citation Context ...s of augmented Lagrangian (AL) methods, which have a long and successful history in optimization [13, 21], and which have recently been shown to be extremely competitive for some large scale problems =-=[1, 7, 12]-=-. Specifically, we use the alternating direction method of multipliers (ADMM) [8, 11, 6] to handle the dual of the constrained problem resulting from the DD, and show that the resulting method has sta... |

50 | On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing
- Rush, Sontag, et al.
- 2010
(Show Context)
Citation Context ...d Final Remarks The DD method for MAP inference was first proposed for image segmentation using pairwise [17] and higher order factor graphs [16]. It was recently adopted for natural language parsing =-=[24, 18]-=-, with only a couple of slave subproblems handled with dynamic programming. Accelerated DD were first considered in [14], where the approach is to individually smooth each slave subproblem with the ad... |

34 | Message-passing for graph-structured linear programs: Proximal projections, convergence and rounding schemes
- Ravikumar, Agarwal, et al.
- 2008
(Show Context)
Citation Context ...ls. Larger slaves. Finally, note that it is simple to address general, larger subgraphs using algorithms with fast convergence guarantees: e.g., a primal-dual vanilla scheme, like the one proposed in =-=[22]-=-, resulting in cyclic projection algorithms. These can be useful to tackle coarser decompositions, in which each subgraph is a chain or a tree. Even if each projection is not computed exactly, converg... |

29 |
Turbo Parsers: Dependency Parsing by Approximate Variational Inference
- Martins, Smith, et al.
- 2010
(Show Context)
Citation Context ...e case of binary pairwise factors, and an exact and efficient (O(|N(a)| log |N(a)|), ) 3since it is based on sorting) algorithm for several hard constraint factors that arise in practice (see, e.g., =-=[19]-=-). Up to log terms, this cost is the same as that of computing the MAP for those factors. Binary pairwise factors. If factor a is binary and pairwise (|N(a)| = 2), problem (7) can be re-written as the... |

23 | Accelerated dual decomposition for MAP inference
- Jojic, Gould, et al.
- 2010
(Show Context)
Citation Context ... this dual problem efficiently for lightweight decompositions (i.e., with few slaves), in the presence of a large number of slaves its performance degrades, making accelerated methods worthy of study =-=[14]-=-. Here, we propose to ally the strength of DD with the effectiveness of augmented Lagrangian (AL) methods, which have a long and successful history in optimization [13, 21], and which have recently be... |

22 | Restoration of poissonian images using alternating direction optimization
- Figueiredo, Bioucas-Dias
- 2010
(Show Context)
Citation Context ...s of augmented Lagrangian (AL) methods, which have a long and successful history in optimization [13, 21], and which have recently been shown to be extremely competitive for some large scale problems =-=[1, 7, 12]-=-. Specifically, we use the alternating direction method of multipliers (ADMM) [8, 11, 6] to handle the dual of the constrained problem resulting from the DD, and show that the resulting method has sta... |

22 |
HOP-MAP: efficient message passing with high order potentials
- Tarlow, Givoni, et al.
- 2010
(Show Context)
Citation Context ...tial functions: φa(xa) = 0, if xa ∈ Sa, and −∞ otherwise, where Sa is an acceptance set. This type of factors has several applications, such as error-correcting decoders [23], named entity resolution =-=[26]-=-, and dependency parsing [19]. For binary variables, hard factors impose logical constraints; e.g., • the one-hot XOR factor, for which SXOR = {(x1, . . . , xn) ∈ {0, 1} n | ∑n i=1 xi = 1}, • the OR f... |

21 |
Fast alternating linearization methods for minimizing the sum of two convex functions
- Goldfarb, Ma, et al.
- 2012
(Show Context)
Citation Context ...s of augmented Lagrangian (AL) methods, which have a long and successful history in optimization [13, 21], and which have recently been shown to be extremely competitive for some large scale problems =-=[1, 7, 12]-=-. Specifically, we use the alternating direction method of multipliers (ADMM) [8, 11, 6] to handle the dual of the constrained problem resulting from the DD, and show that the resulting method has sta... |

16 | Learning Efficiently with Approximate Inference via Dual Losses
- Meshi, Sontag, et al.
- 2010
(Show Context)
Citation Context ...his deserves further study. 5 Experiments We compare DD-ADMM (Alg. 2) with two other approximate MAP inference algorithms: DDsubgradient ([17], Alg. 1); Star-MSD (max-sum diffusion with star updates, =-=[20]-=-), which performs dual block coordinate descent message-passing. Fig. 1 shows a typical plot for an Ising model (binary pairwise MRF) on a random grid. We observe that DD-subgradient is the slowest, t... |

3 |
Modern coding theory. Cambridge Univ Pr
- Richardson, Urbanke
- 2008
(Show Context)
Citation Context ...ctors have indicator log-potential functions: φa(xa) = 0, if xa ∈ Sa, and −∞ otherwise, where Sa is an acceptance set. This type of factors has several applications, such as error-correcting decoders =-=[23]-=-, named entity resolution [26], and dependency parsing [19]. For binary variables, hard factors impose logical constraints; e.g., • the one-hot XOR factor, for which SXOR = {(x1, . . . , xn) ∈ {0, 1} ... |

1 |
A method for nonlinear constraints in minimization problems
- Powel
- 2009
(Show Context)
Citation Context ...lerated methods worthy of study [14]. Here, we propose to ally the strength of DD with the effectiveness of augmented Lagrangian (AL) methods, which have a long and successful history in optimization =-=[13, 21]-=-, and which have recently been shown to be extremely competitive for some large scale problems [1, 7, 12]. Specifically, we use the alternating direction method of multipliers (ADMM) [8, 11, 6] to han... |