Results 1 -
4 of
4
Collective stability in structured prediction: Generalization from one example.
- In International Conference on Machine Learning,
, 2013
"... Abstract Structured predictors enable joint inference over multiple interdependent output variables. These models are often trained on a small number of examples with large internal structure. Existing distribution-free generalization bounds do not guarantee generalization in this setting, though t ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
Abstract Structured predictors enable joint inference over multiple interdependent output variables. These models are often trained on a small number of examples with large internal structure. Existing distribution-free generalization bounds do not guarantee generalization in this setting, though this contradicts a large body of empirical evidence from computer vision, natural language processing, social networks and other fields. In this paper, we identify a set of natural conditionsweak dependence, hypothesis complexity and a new measure, collective stability-that are sufficient for generalization from even a single example, without imposing an explicit generative model of the data. We then demonstrate that the complexity and stability conditions are satisfied by a broad class of models, including marginal inference in templated graphical models. We thus obtain uniform convergence rates that can decrease significantly faster than previous bounds, particularly when each structured example is sufficiently large and the number of training examples is constant, even one.
PAC-Bayesian Collective Stability
"... Recent results have shown that the gener-alization error of structured predictors de-creases with both the number of examples and the size of each example, provided the data distribution has weak dependence and the predictor exhibits a smoothness property called collective stability. These results u ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recent results have shown that the gener-alization error of structured predictors de-creases with both the number of examples and the size of each example, provided the data distribution has weak dependence and the predictor exhibits a smoothness property called collective stability. These results use an especially strong definition of collective stability that must hold uniformly over all inputs and all hypotheses in the class. We investigate whether weaker definitions of col-lective stability suffice. Using the PAC-Bayes framework, which is particularly amenable to our new definitions, we prove that generaliza-tion is indeed possible when uniform collec-tive stability happens with high probability over draws of predictors (and inputs). We then derive a generalization bound for a class of structured predictors with variably convex inference, which suggests a novel learning ob-jective that optimizes collective stability. 1
Stability and Generalization in Structured Prediction
, 2016
"... Abstract Structured prediction models have been found to learn effectively from a few large examplessometimes even just one. Despite empirical evidence, canonical learning theory cannot guarantee generalization in this setting because the error bounds decrease as a function of the number of example ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Structured prediction models have been found to learn effectively from a few large examplessometimes even just one. Despite empirical evidence, canonical learning theory cannot guarantee generalization in this setting because the error bounds decrease as a function of the number of examples. We therefore propose new PAC-Bayesian generalization bounds for structured prediction that decrease as a function of both the number of examples and the size of each example. Our analysis hinges on the stability of joint inference and the smoothness of the data distribution. We apply our bounds to several common learning scenarios, including max-margin and soft-max training of Markov random fields. Under certain conditions, the resulting error bounds can be far more optimistic than previous results and can even guarantee generalization from a single large example.