## Learning Message-Passing Inference Machines for Structured Prediction

Citations: | 7 - 4 self |

### BibTeX

@MISC{Ross_learningmessage-passing,

author = {Stéphane Ross and Daniel Munoz and Martial Hebert and J. Andrew Bagnell},

title = {Learning Message-Passing Inference Machines for Structured Prediction},

year = {}

}

### OpenURL

### Abstract

Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learned, predictions are often inaccurate due to approximations. In this work, instead of performing inference over a graphical model, we instead consider the inference procedure as a composition of predictors. Specifically, we focus on message-passing algorithms, such as Belief Propagation, and show how they can be viewed as procedures that sequentially predict label distributions at each node over a graph. Given labeled graphs, we can then train the sequence of predictors to output the correct labelings. The result no longer corresponds to a graphical model but simply defines an inference procedure, with strong theoretical properties, that can be used to classify new graphs. We demonstrate the scalability and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images. 1.

### Citations

7556 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...s (e.g., one for each (super)pixel). In order to cope with this problem, inference inevitably relies on approximate methods: Monte-Carlo, loopy belief propagation, graph-cuts, and variational methods =-=[16, 2, 23]-=-. Unfortunately, learning these models with approximate inference is not well understood [9, 6]. Additionally, it has been observed that it is important to tie the graphical model to the specific appr... |

2541 | Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...ty and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images. 1. Introduction Probabilistic graphical models, such as Conditional Random Fields (CRFs) =-=[11]-=-, have proven to be a remarkably successful tool for structured prediction that, in principle, provide a clean separation between modeling and inference. However, exact inference for problems in compu... |

1509 | Fast approximate energy minimization via graph cuts
- Boykov, Veksler, et al.
- 2001
(Show Context)
Citation Context ...s (e.g., one for each (super)pixel). In order to cope with this problem, inference inevitably relies on approximate methods: Monte-Carlo, loopy belief propagation, graph-cuts, and variational methods =-=[16, 2, 23]-=-. Unfortunately, learning these models with approximate inference is not well understood [9, 6]. Additionally, it has been observed that it is important to tie the graphical model to the specific appr... |

829 | Gradient-based learning applied to document recognition
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ... rigorous performance bounds on the loss of the final predictions; however, they do not optimize it directly. If the predictors learned are differentiable functions, a procedure like back-propagation =-=[12]-=- make it possible to identify local optima of the objective (minimizing loss of the final marginals). As this optimization problem is non-convex and there are potentially many local minima, it can be ... |

466 | Max-margin Markov networks
- Taskar, Guestrin, et al.
- 2004
(Show Context)
Citation Context ...negligible when looking at the sum of loss over the whole graph, we can choose N to be on the order of the number of nodes in a graph. Though in practice, often much smaller number of iterations (N ∈ =-=[10, 20]-=-), is sufficient to obtain good predictors under their induced distributions. 3.3. Global Training via Back-Propagation In both synchronous and asynchronous approaches, the local training procedures p... |

146 | Robust higher order potentials for enforcing label consistency - Kohli, Ladický, et al. - 2008 |

137 | Recovering surface layout from an image
- Hoiem, Efros, et al.
(Show Context)
Citation Context ...6 test scans. 3D Surface Layout Estimation. We also evaluate our approach on the problem of estimating the 3D surface layout from single images, using the Geometric Context Dataset4 from Hoiem et al. =-=[7]-=-. In this dataset, the problem is to assign the 3D geometric surface labels to pixels in the image (see Fig. 1). This task can be viewed as a 3-class or 7-class labeling problem. In the 3-class case, ... |

106 |
Learning deep architectures for AI. Foundations and Trends
- Bengio
(Show Context)
Citation Context ...ight relationship between learning and inference. To enable this combination, we propose an alternate view of the approximate inference process as a long sequence of computational modules to optimize =-=[1]-=- such that the sequence results in correct predictions. We focus on message-passing inference procedures, such as Belief Propagation, which compute marginal distributions over output variables by iter... |

93 | Training structural SVMs when exact inference is intractable
- Finley, Joachims
- 2008
(Show Context)
Citation Context ... on approximate methods: Monte-Carlo, loopy belief propagation, graph-cuts, and variational methods [16, 2, 23]. Unfortunately, learning these models with approximate inference is not well understood =-=[9, 6]-=-. Additionally, it has been observed that it is important to tie the graphical model to the specific approximate inference procedure used at test time to obtain better predictions [10, 22]. Figure 1: ... |

89 | Residual belief propagation: Informed scheduling for asynchronous message passing
- Elidan, McGraw, et al.
- 2006
(Show Context)
Citation Context ...ral times or until convergence. The final marginals at each variable are computed using the last equation. Asynchronous message passing often allows faster convergence and methods such as Residual BP =-=[5]-=- have been developed to achieve still faster convergence by prioritizing the messages to compute. 2.1. Understanding Message Passing as Sequential Probabilistic Classification By definition of P (v = ... |

89 | Auto-context and its application to high-level vision tasks and 3d brain image segmentation
- Tu, Bai
(Show Context)
Citation Context ...irectly by learning a graphical model? Perhaps it is possible to optimize the inference procedure more directly, without building an explicit probabilistic model over the data. Some recent approaches =-=[4, 21]-=- eschew the probabilistic graphical model entirely with notable successes. However, we would ideally like to have the best of both worlds: the proven success of error-correcting iterative decoding met... |

52 | Structured Learning with Approximate Inference. NIPS
- Kulesza, Pereira
- 2007
(Show Context)
Citation Context ... on approximate methods: Monte-Carlo, loopy belief propagation, graph-cuts, and variational methods [16, 2, 23]. Unfortunately, learning these models with approximate inference is not well understood =-=[9, 6]-=-. Additionally, it has been observed that it is important to tie the graphical model to the specific approximate inference procedure used at test time to obtain better predictions [10, 22]. Figure 1: ... |

51 | D.: Advanced Mean Field Methods: Theory and Practice - Opper, Saad - 2001 |

48 | Tree-based reparameterization for approximate estimation on loopy graphs
- Wainwright, Jaakkola, et al.
- 2001
(Show Context)
Citation Context ...s (e.g., one for each (super)pixel). In order to cope with this problem, inference inevitably relies on approximate methods: Monte-Carlo, loopy belief propagation, graph-cuts, and variational methods =-=[16, 2, 23]-=-. Unfortunately, learning these models with approximate inference is not well understood [9, 6]. Additionally, it has been observed that it is important to tie the graphical model to the specific appr... |

41 |
Search-based structured prediction
- Langford, Marcu
- 2009
(Show Context)
Citation Context ...irectly by learning a graphical model? Perhaps it is possible to optimize the inference procedure more directly, without building an explicit probabilistic model over the data. Some recent approaches =-=[4, 21]-=- eschew the probabilistic graphical model entirely with notable successes. However, we would ideally like to have the best of both worlds: the proven success of error-correcting iterative decoding met... |

38 | Estimating the “wrong” graphical model: Benefits in the computation-limited setting
- Wainwright
(Show Context)
Citation Context ...ll understood [9, 6]. Additionally, it has been observed that it is important to tie the graphical model to the specific approximate inference procedure used at test time to obtain better predictions =-=[10, 22]-=-. Figure 1: Applications of structured prediction in computer vision. Left: 3D surface layout estimation. Right: 3D point cloud classification. When the learned graphical model is tied to the inferenc... |

32 | Stacked hierarchical labeling
- Munoz, Bagnell, et al.
- 2010
(Show Context)
Citation Context ...lications. The technique of [21] can be understood as using forward training on a synchronous message passing using only marginals, similar to mean-field inference. Similarly, from our point of view, =-=[13]-=- implements a ”half-pass” of hierarchical meanfield message passing by descending once down a hierarchy making contextual predictions. We demonstrate in our experiments the benefits of enabling more g... |

31 | A reduction of imitation learning and structured prediction to no-regret online learning
- Ross, Bagnell
- 2011
(Show Context)
Citation Context ...lly trade-off accuracy versus speed of inference in real-time settings. Furthermore, in contrast with most approaches to learning inference, we are able to provide rigorous reduction-style guarantees =-=[18]-=- on the performance of the resulting inference procedure. Training such a predictor, however, is non-trivial as the interdependencies in the sequence of predictions make global optimization difficult.... |

30 | Contextual classification with functional max-margin markov networks
- Munoz, Bagnell, et al.
- 2009
(Show Context)
Citation Context ...) and pairwise potentials (factors connected between 2 nodes). It is also possible to consider higher order potentials by having a factor connecting many nodes (e.g., cluster/segment potentials as in =-=[14]-=-). Training a graphical model is achieved by optimizing the potentials φf on an objective function (e.g., margin, pseudo-likelihood, etc.) defined over training data. To classify a new scene, an (appr... |

26 | Exploiting Inference for approximate parameter learning in discriminative fields
- Kumar, August, et al.
- 2005
(Show Context)
Citation Context ...ll understood [9, 6]. Additionally, it has been observed that it is important to tie the graphical model to the specific approximate inference procedure used at test time to obtain better predictions =-=[10, 22]-=-. Figure 1: Applications of structured prediction in computer vision. Left: 3D surface layout estimation. Right: 3D point cloud classification. When the learned graphical model is tied to the inferenc... |

22 | Efficient reductions for imitation learning
- Ross, Bagnell
- 2010
(Show Context)
Citation Context ...ce in practice in this non-i.i.d. setting (as predictions are interdependent), we also leverage key iterative training methods developed in prior work for imitation learning and structured prediction =-=[17, 18, 4]-=-. These techniques allow us to iteratively train probabilistic predictors that predict the ideal variable marginals under the distribution of inputs the learned predictors induce during inference. Opt... |

11 | TAP Gibbs free energy, belief propagation and sparsity
- Csató, Opper, et al.
- 2002
(Show Context)
Citation Context ... the message mvf can be interpreted as the marginal of variable v when the factor f (and its influence) is removed from the graph. This is often referred as the cavity method in statistical mechanics =-=[3]-=- and mvf are known as cavity marginals. By expanding the definition of mvf , we can see that it may depend only on the messages mv ′ f ′ sent by all variables v′ connected to v by a factor f ′ ̸= f: m... |

7 | Sequential learning of classifiers for structured prediction problems - Roth, Small, et al. - 2009 |