Results 1 
4 of
4
Collective stability in structured prediction: Generalization from one example.
 In International Conference on Machine Learning,
, 2013
"... Abstract Structured predictors enable joint inference over multiple interdependent output variables. These models are often trained on a small number of examples with large internal structure. Existing distributionfree generalization bounds do not guarantee generalization in this setting, though t ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Abstract Structured predictors enable joint inference over multiple interdependent output variables. These models are often trained on a small number of examples with large internal structure. Existing distributionfree generalization bounds do not guarantee generalization in this setting, though this contradicts a large body of empirical evidence from computer vision, natural language processing, social networks and other fields. In this paper, we identify a set of natural conditionsweak dependence, hypothesis complexity and a new measure, collective stabilitythat are sufficient for generalization from even a single example, without imposing an explicit generative model of the data. We then demonstrate that the complexity and stability conditions are satisfied by a broad class of models, including marginal inference in templated graphical models. We thus obtain uniform convergence rates that can decrease significantly faster than previous bounds, particularly when each structured example is sufficiently large and the number of training examples is constant, even one.
PACBayesian Collective Stability
"... Recent results have shown that the generalization error of structured predictors decreases with both the number of examples and the size of each example, provided the data distribution has weak dependence and the predictor exhibits a smoothness property called collective stability. These results u ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Recent results have shown that the generalization error of structured predictors decreases with both the number of examples and the size of each example, provided the data distribution has weak dependence and the predictor exhibits a smoothness property called collective stability. These results use an especially strong definition of collective stability that must hold uniformly over all inputs and all hypotheses in the class. We investigate whether weaker definitions of collective stability suffice. Using the PACBayes framework, which is particularly amenable to our new definitions, we prove that generalization is indeed possible when uniform collective stability happens with high probability over draws of predictors (and inputs). We then derive a generalization bound for a class of structured predictors with variably convex inference, which suggests a novel learning objective that optimizes collective stability. 1
Stability and Generalization in Structured Prediction
, 2016
"... Abstract Structured prediction models have been found to learn effectively from a few large examplessometimes even just one. Despite empirical evidence, canonical learning theory cannot guarantee generalization in this setting because the error bounds decrease as a function of the number of example ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Structured prediction models have been found to learn effectively from a few large examplessometimes even just one. Despite empirical evidence, canonical learning theory cannot guarantee generalization in this setting because the error bounds decrease as a function of the number of examples. We therefore propose new PACBayesian generalization bounds for structured prediction that decrease as a function of both the number of examples and the size of each example. Our analysis hinges on the stability of joint inference and the smoothness of the data distribution. We apply our bounds to several common learning scenarios, including maxmargin and softmax training of Markov random fields. Under certain conditions, the resulting error bounds can be far more optimistic than previous results and can even guarantee generalization from a single large example.