Results 1  10
of
14
Prediction of time series by statistical learning: general losses and fast rates
, 2012
"... Abstract: We establish rates of convergences in time series forecasting using the statistical learning approach based on oracle inequalities. A series of papers (e.g. [MM98, Mei00, BCV01, AW12]) extends the oracle inequalities obtained for iid observations to time series under weak dependence condit ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract: We establish rates of convergences in time series forecasting using the statistical learning approach based on oracle inequalities. A series of papers (e.g. [MM98, Mei00, BCV01, AW12]) extends the oracle inequalities obtained for iid observations to time series under weak dependence conditions. Given a family of predictors and n observations, oracle inequalities state that a predictor forecasts the series as well as the best predictor in the family up to a remainder term ∆n. Using the PACBayesian approach, we establishunderweak dependence conditionsoracleinequalitieswithoptimal rates of convergence ∆n. We extend results given in [AW12] for the absolute loss function to any Lipschitz loss function with rates ∆n ∼ √ c(Θ)/n where c(Θ) measures the complexity of the model. We apply the method for quantile loss functions to forecast the french GDP. Under additional conditions on the loss functions (satisfied by the quadratic loss function) and on the time series, we refine the rates of convergence to ∆n ∼ c(Θ)/n. We achieve for the first time these fast rates for uniformly mixing processes. These rates are known to be optimal in the iid case, see [Tsy03], and for individual sequences, see [CBL06]. In particular, we generalize the results of [DT08] on sparse regression estimation to the case of autoregression. ∗ We deeply thank Matthieu Cornec (INSEE) for useful discussions, and for providing the data with detailed explanations. We would like to thank Prs. Olivier Catoni, Paul Doukhan, Pascal Massart and Gilles Stoltz for useful comments. We want to mention that a preliminary
Prediction of quantiles by statistical learning and application to GDP forecasting
"... Abstract. In this paper, we tackle the problem of prediction and confidence intervals for time series using a statistical learning approach and quantile loss functions. In a first time, we show that the Gibbs estimator is able to predict as well as the best predictor in a given family for a wide se ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. In this paper, we tackle the problem of prediction and confidence intervals for time series using a statistical learning approach and quantile loss functions. In a first time, we show that the Gibbs estimator is able to predict as well as the best predictor in a given family for a wide set of loss functions. In particular, using the quantile loss function of [1], this allows to build confidence intervals. We apply these results to the problem of prediction and confidence regions for the French Gross Domestic Product (GDP) growth, with promising results.
PACBayes Generalization Bounds for Randomized Structured Prediction
"... We present a new PACBayes generalization bound for structured prediction that is applicable to perturbationbased probabilistic models. Our analysis explores the relationship between perturbationbased modeling and the PACBayes framework, and connects to recently introduced generalization bounds f ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present a new PACBayes generalization bound for structured prediction that is applicable to perturbationbased probabilistic models. Our analysis explores the relationship between perturbationbased modeling and the PACBayes framework, and connects to recently introduced generalization bounds for structured prediction. We obtain the first PACBayes bounds that guarantee better generalization as the size of each structured example grows. 1
PACBayesian Collective Stability
"... Recent results have shown that the generalization error of structured predictors decreases with both the number of examples and the size of each example, provided the data distribution has weak dependence and the predictor exhibits a smoothness property called collective stability. These results u ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Recent results have shown that the generalization error of structured predictors decreases with both the number of examples and the size of each example, provided the data distribution has weak dependence and the predictor exhibits a smoothness property called collective stability. These results use an especially strong definition of collective stability that must hold uniformly over all inputs and all hypotheses in the class. We investigate whether weaker definitions of collective stability suffice. Using the PACBayes framework, which is particularly amenable to our new definitions, we prove that generalization is indeed possible when uniform collective stability happens with high probability over draws of predictors (and inputs). We then derive a generalization bound for a class of structured predictors with variably convex inference, which suggests a novel learning objective that optimizes collective stability. 1
Computing Centre Russian Academy of Sciences
"... We present a PACBayesEmpiricalBernstein inequality. The inequality is based on a combination of the PACBayesian bounding technique with an Empirical Bernstein bound. We show that when the empirical variance is significantly smaller than the empirical loss the PACBayesEmpiricalBernstein inequa ..."
Abstract
 Add to MetaCart
(Show Context)
We present a PACBayesEmpiricalBernstein inequality. The inequality is based on a combination of the PACBayesian bounding technique with an Empirical Bernstein bound. We show that when the empirical variance is significantly smaller than the empirical loss the PACBayesEmpiricalBernstein inequality is significantly tighter than the PACBayeskl inequality of Seeger (2002) and otherwise it is comparable. Our theoretical analysis is confirmed empirically on a synthetic example and several UCI datasets. The PACBayesEmpiricalBernstein inequality is an interesting example of an application of the PACBayesian bounding technique to selfbounding functions. 1
Stability and Generalization in Structured Prediction
, 2016
"... Abstract Structured prediction models have been found to learn effectively from a few large examplessometimes even just one. Despite empirical evidence, canonical learning theory cannot guarantee generalization in this setting because the error bounds decrease as a function of the number of example ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Structured prediction models have been found to learn effectively from a few large examplessometimes even just one. Despite empirical evidence, canonical learning theory cannot guarantee generalization in this setting because the error bounds decrease as a function of the number of examples. We therefore propose new PACBayesian generalization bounds for structured prediction that decrease as a function of both the number of examples and the size of each example. Our analysis hinges on the stability of joint inference and the smoothness of the data distribution. We apply our bounds to several common learning scenarios, including maxmargin and softmax training of Markov random fields. Under certain conditions, the resulting error bounds can be far more optimistic than previous results and can even guarantee generalization from a single large example.
A. Garivier et al, Editors ON SOME RECENT ADVANCES ON HIGH DIMENSIONAL BAYESIAN STATISTICS
"... Abstract. This paper proposes to review some recent developments in Bayesian statistics for high dimensional data. After giving some brief motivations in a short introduction, we describe new advances in the understanding of Bayes posterior computation as well as theoretical contributions in non pa ..."
Abstract
 Add to MetaCart
Abstract. This paper proposes to review some recent developments in Bayesian statistics for high dimensional data. After giving some brief motivations in a short introduction, we describe new advances in the understanding of Bayes posterior computation as well as theoretical contributions in non parametric and high dimensional Bayesian approaches. From an applied point of view, we describe the socalled sqmc particle method to compute posterior Bayesian law, and provide a nonparametric analysis of the widespread abc method. On the theoretical side, we describe some recent advances in Bayesian consistency for a nonparametric hidden Markov model as well as new pacBayesian results for different models of high dimensional regression. Résumé. Nous proposons dans cet article une vue d’ensemble de récents développements en statistique bayésiennes en grande dimension. Après quelques motivations rappelées en introduction, nous présentons des avancées à la fois algorithmiques et dans la compréhension théorique de méthodes de calculs d’a posteriori bayésiens. En particulier, nous décrivons l’algorithme particulaire sqmc et proposons un point de vue nonparamétrique sur la méthode populaire abc. Nous revenons ensuite également sur des contributions nouvelles en statistiques bayésiennes non paramétriques et en grandes dimensions. Dans ce contexte, nous décrivons des résultats de consistance bayésienne a posteriori pour des modèles nonparamétriques de Markov cachés ainsi que des résultats pacbayésiens pour différents modèles de régression. 1.
Multilabel Structured Output Learning with Random Spanning Trees of MaxMargin Markov Networks
"... We show that the usual score function for conditional Markov networks can be written as the expectation over the scores of their spanning trees. We also show that a small random sample of these output trees can attain a significant fraction of the margin obtained by the complete graph and we provide ..."
Abstract
 Add to MetaCart
We show that the usual score function for conditional Markov networks can be written as the expectation over the scores of their spanning trees. We also show that a small random sample of these output trees can attain a significant fraction of the margin obtained by the complete graph and we provide conditions under which we can perform tractable inference. The experimental results confirm that practical learning is scalable to realistic datasets using this approach. 1
Lifelong Learning with Noni.i.d. Tasks
"... Abstract In this work we aim at extending the theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that learning tasks are sampled i.i.d. from a task environment or limited to strongly constrained data distributions. Instead, we study two sc ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In this work we aim at extending the theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that learning tasks are sampled i.i.d. from a task environment or limited to strongly constrained data distributions. Instead, we study two scenarios when lifelong learning is possible, even though the observed tasks do not form an i.i.d. sample: first, when they are sampled from the same environment, but possibly with dependencies, and second, when the task environment is allowed to change over time in a consistent way. In the first case we prove a PACBayesian theorem that can be seen as a direct generalization of the analogous previous result for the i.i.d. case. For the second scenario we propose to learn an inductive bias in form of a transfer procedure. We present a generalization bound and show on a toy example how it can be used to identify a beneficial transfer algorithm.
Supplementary material: Curriculum Learning of Multiple Tasks
"... We apply PACBayesian theory to prove a generalization bound for the case of sequential task solving. For more details on it see [1, 6, 9]. Assume that the learner observes a sequence of tasks in a fixed order, t1,..., tn, with corresponding training sets, S1,..., Sn, where Si = {(xi1, yi1),..., (x ..."
Abstract
 Add to MetaCart
(Show Context)
We apply PACBayesian theory to prove a generalization bound for the case of sequential task solving. For more details on it see [1, 6, 9]. Assume that the learner observes a sequence of tasks in a fixed order, t1,..., tn, with corresponding training sets, S1,..., Sn, where Si = {(xi1, yi1),..., (ximi, yimi)} consists of mi i.i.d. samples from a taskspecific data distribution Di. We assume that all tasks share the same input space X and output space Y and that the learner uses the same loss function l: Y × Y → [0, 1] and hypothesis set H ⊂ {h: X → Y} for solving these tasks. The learner solves only one task at a time by using some arbitrary but fixed deterministic algorithm A that produces a posterior distribution Qi over H based on training data Si and some prior knowledge Pi, which is also expressed in form of probability distribution over the hypothesis set. Moreover, we assume that the solution Qi plays the role of a prior for the next task, i.e. Pi+1 = Qi (P1 is just some fixed distribution, Q0). For making predictions for task ti the learner uses the Gibbs predictor, associated with the corresponding posterior distribution Qi. For an input x ∈ X this randomized predictor samples h ∈ H according to Qi and returns h(x). The goal of the learner is to perform well on all tasks, t1,..., tn, i.e. to minimize the average expected error of the Gibbs classifiers defined by Q1,..., Qn: er = 1 n n∑ i=1 eri(Qi(Qi−1, Si)) = 1 n n∑ i=1 E(x,y)∼DiEh∼Qi l(h(x), y). (1) Since the data distributions of the tasks t1,..., tn are unknown, one can not directly compute (1). However, it can be approximated by the empirical error based on the observed data: êr = 1 n n∑ i=1 êri(Qi(Qi−1, Si)) =