## Structured learning with approximate inference

### Cached

### Download Links

- [www.cis.upenn.edu]
- [web.eecs.umich.edu]
- [books.nips.cc]
- [books.nips.cc]
- [static.googleusercontent.com]
- DBLP

### Other Repositories/Bibliography

Venue: | Advances in Neural Information Processing Systems |

Citations: | 52 - 1 self |

### BibTeX

@INPROCEEDINGS{Kulesza_structuredlearning,

author = {Alex Kulesza and O Pereira},

title = {Structured learning with approximate inference},

booktitle = {Advances in Neural Information Processing Systems},

year = {},

pages = {2007}

}

### OpenURL

### Abstract

In many structured prediction problems, the highest-scoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufficient. We show in particular that learning can fail even with an approximate inference method with rigorous approximation guarantees. There are two reasons for this. First, approximate methods can effectively reduce the expressivity of an underlying model by making it impossible to choose parameters that reliably give good predictions. Second, approximations can respond to parameter changes in such a way that standard learning algorithms are misled. In contrast, we give two positive results in the form of learning bounds for the use of LP-relaxed inference in structured perceptron and empirical risk minimization settings. We argue that without understanding combinations of inference and learning, such as these, that are appropriately compatible, learning performance under approximate inference cannot be guaranteed. 1

### Citations

7446 |
Probabilistic Reasoning in Intelligent System: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...rkov networks with binary labels [3]. More generally, however, efficient but approximate inference procedures have been devised that apply to a wide range of models,including loopy belief propagation =-=[4, 5]-=-, tree-reweighted message passing [6], and linear programming relaxations [7, 3], all of which give efficient approximate predictions for graphical models ofarbitrary structure. Since some form of inf... |

2493 | Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...intractable [1]. There are two ways to address this difficulty. Directly, models used in practice can be restricted to those for which inference is feasible, such as conditional random fieldson trees =-=[2]-=- or associative Markov networks with binary labels [3]. More generally, however, efficient but approximate inference procedures have been devised that apply to a wide range of models,including loopy b... |

604 |
The computational complexity of probabilistic inference using Bayesian belief network
- Cooper
- 1990
(Show Context)
Citation Context ... performance under approximateinference cannot be guaranteed. 1 Introduction Structured prediction models commonly involve complex inference problems for which finding ex-act solutions is intractable =-=[1]-=-. There are two ways to address this difficulty. Directly, models used in practice can be restricted to those for which inference is feasible, such as conditional random fieldson trees [2] or associat... |

517 | Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms
- Collins
- 2002
(Show Context)
Citation Context ...sed on research supported by NSF ITR IIS 0428193. 1sWith these considerations in mind, we prove that LP-relaxation-based approximate inference procedures are compatible with the structured perceptron =-=[9]-=- as well as empirical risk minimization with a margin criterion using the PAC-Bayes framework [10, 11]. 2 Setting Given a scoring model S(y|x) over candidate labelings y for input x, exact Viterbi inf... |

493 | Loopy belief propagation for approximate inference: an empirical study
- Murphy, Weiss, et al.
- 1999
(Show Context)
Citation Context ...kov networks with binary labels [3]. More generally, however, efficient but approximate inference procedures have been devised that apply to a wide range of models, including loopy belief propagation =-=[4, 5]-=-, tree-reweighted message passing [6], and linear programming relaxations [7, 3], all of which give efficient approximate predictions for graphical models of arbitrary structure. Since some form of in... |

140 | MAP estimation via agreement on trees: Message passing and linear programming
- Wainwright, Jaakola, et al.
(Show Context)
Citation Context ...e generally, however, efficient but approximate inference procedures have been devised that apply to a wide range of models, including loopy belief propagation [4, 5], tree-reweighted message passing =-=[6]-=-, and linear programming relaxations [7, 3], all of which give efficient approximate predictions for graphical models of arbitrary structure. Since some form of inference is the dominant subroutine fo... |

117 | A linear programming formulation for global inference in natural language tasks
- Roth, Yih
- 2004
(Show Context)
Citation Context ...roximate inference procedures have been devised that apply to a wide range of models,including loopy belief propagation [4, 5], tree-reweighted message passing [6], and linear programming relaxations =-=[7, 3]-=-, all of which give efficient approximate predictions for graphical models ofarbitrary structure. Since some form of inference is the dominant subroutine for all structured learning algorithms, itis n... |

83 | Piecewise training of undirected models
- Sutton, McCallum
- 2005
(Show Context)
Citation Context ...work A number of authors have applied inference approximations to a wide range of learning problems, sometimes with theoretical analysis of approximation quality and often with good empirical results =-=[8, 12, 3]-=-. However, none to our knowledge has investigated the theoretical relationship between approximation and learning performance. Daume et al. [13] developed a method for using a linear model to make dec... |

78 | Learning associative Markov networks
- Taskar, Chatalbashev, et al.
- 2004
(Show Context)
Citation Context ...ficulty. Directly, models used in practice can be restricted to those for which inference is feasible, such as conditional random fields on trees [2] or associative Markov networks with binary labels =-=[3]-=-. More generally, however, efficient but approximate inference procedures have been devised that apply to a wide range of models, including loopy belief propagation [4, 5], tree-reweighted message pas... |

78 | Collective segmentation and labeling of distant entities in information extraction
- Sutton, McCallum
- 2004
(Show Context)
Citation Context ...utions to the problem of tractable learning as well. A number of authors have taken this approach, using inference approximations as drop-in replacements during training, often with empirical success =-=[3, 8]-=-. And yet there has been little theoretical analysis of the relationship between approximate inference and reliable learning. We demonstrate with two counterexamples that the characteristics of approx... |

72 |
Learning as search optimization: Approximate large margin methods for structured prediction
- DaumeĢ, Marcu
- 2005
(Show Context)
Citation Context ...ion quality and often with good empirical results [8, 12, 3]. However, none to our knowledge has investigated the theoretical relationship between approximation and learning performance. Daume et al. =-=[13]-=- developed a method for using a linear model to make decisions during a search-based approximate inference process. They showed that perceptron updates give rise to a mistake bound under the assumptio... |

63 | Bayesian stochastic model selection
- McAllester
- 2003
(Show Context)
Citation Context ...hat LP-relaxation-based approximate inference procedures are compatible with the structured perceptron [9] as well as empirical risk minimization with a margin criterion using the PAC-Bayes framework =-=[10, 11]-=-. 2 Setting Given a scoring model S(y|x) over candidate labelings y for input x, exact Viterbi inference is the computation of the optimal labeling h(x) = arg max y S(y|x) . (1) In a prediction settin... |

45 | Learning and inference over constrained output
- Punyakanok, Roth, et al.
- 2005
(Show Context)
Citation Context ...esting, minimizing a general measure of empirical risk rather than maximizing data likelihood, and argue for compatibility between the learning method and inference process. 7 (7) (8) (9)sRoth et al. =-=[15]-=- consider learning independent classifiers for single labels, essentially using a trivial form of approximate inference. They show that this method can outperform exact inference learning when algorit... |

38 | Learning as search optimization: Approximate large margin methods for structured prediction
- Hal, Marcu
- 2005
(Show Context)
Citation Context ...ion quality and often with good empirical results [8, 12, 3]. However, none to our knowledge has investigated the theoretical relationship between approximation and learning performance. Daume et al. =-=[13]-=- developed a method for using a linear model to make decisions during a search-based approximate inference process. They showed that perceptron updates give rise to a mistake bound under the assumptio... |

38 | Estimating the wrong graphical model: Benefits in the computation-limited setting
- Wainwright
- 2006
(Show Context)
Citation Context ...s leading to correct decisions exist. Such results are analogous to those presented in Section 5 in that performance bounds follow from an (implicit) assumption of algorithmic separability. Wainright =-=[14]-=- proved that when approximate inference is required at test time due to computational constraints, using an inconsistent (approximate) estimator for learning can be beneficial. His result suggests tha... |

22 |
Generalization bounds and consistency for structured labeling
- McAllester
- 2007
(Show Context)
Citation Context ...hat LP-relaxation-based approximate inference procedures are compatible with the structured perceptron [9] as well as empirical risk minimization with a margin criterion using the PAC-Bayes framework =-=[10, 11]-=-. 2 Setting Given a scoring model S(y|x) over candidate labelings y for input x, exact Viterbi inference is the computation of the optimal labeling h(x) = arg max y S(y|x) . (1) In a prediction settin... |

1 |
Piecewise training of undirected models. In 21st Conference onUncertainty in Artificial Intelligence, 2005. [13] Hal Daum'e III and Daniel Marcu. Learning as search optimization: Approximate large margin methodsfor structured prediction
- Sutton, McCallum
- 2006
(Show Context)
Citation Context ... work A number of authors have applied inference approximations to a wide range of learning problems,sometimes with theoretical analysis of approximation quality and often with good empirical results =-=[8, 12, 3]-=-. However, none to our knowledge has investigated the theoretical relationship betweenapproximation and learning performance. Daume et al. [13] developed a method for using a linear model to make deci... |