Results 1  10
of
13
PACBayesianbound for gaussianprocessregressionand multiple kerneladditive model
 In COLT, arXiv:1102.3616v1 [math.ST
, 2012
"... We develop a PACBayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the ana ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We develop a PACBayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the analysis of Lasso and Dantzig selector. In this paper, we apply PACBayesian technique to show that the Bayesian variant of MKL achieves the optimal convergence rate without such strong conditions on the design. Basically our approach is a combination of PACBayes and recently developed theories of nonparametric Gaussian process regressions. Our bound is developed in a fixed design situation. Our analysis includes the existing result of Gaussian process as a special case and the proof is much simpler by virtue of PACBayesian technique. We also give the convergence rate of the Bayesian variant of Group Lasso as a finite dimensional special case.
HIGHDIMENSIONAL ESTIMATION WITH GEOMETRIC CONSTRAINTS
"... Abstract. Consider measuring a vector x ∈ Rn through the inner product with several measurement vectors, a1, a2,..., am. It is common in both signal processing and statistics to assume the linear response model yi = 〈ai, x〉+ εi, where εi is a noise term. However, in practice the precise relationshi ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Consider measuring a vector x ∈ Rn through the inner product with several measurement vectors, a1, a2,..., am. It is common in both signal processing and statistics to assume the linear response model yi = 〈ai, x〉+ εi, where εi is a noise term. However, in practice the precise relationship between the signal x and the observations yi may not follow the linear model, and in some cases it may not even be known. To address this challenge, in this paper we propose a general model where it is only assumed that each observation yi may depend on ai only through 〈ai, x〉. We do not assume that the dependence is known. This is a form of the semiparametricsingle index model, and it includes the linear model as well as many forms of the generalized linear model as special cases. We further assume that the signal x has some structure, and we formulate this as a general assumption that x belongs to some known (but arbitrary) feasible set K ⊆ Rn. We carefully detail the benefit of using the signal structure to improve estimation. The theory is based on the mean width of K, a geometric parameter which can be used to understand its effective dimension in estimation problems. We determine a simple, efficient twostep procedure for estimating the signal based on this model – a linear estimation followed by metric projection onto K. We give general conditions under which the estimator is minimax optimal up to a constant. This leads to the intriguing conclusion that in the high noise regime, an unknown nonlinearity in the observations does not significantly reduce one’s ability to determine the signal, even when the nonlinearity may be noninvertible. Our results may be specialized to understand the effect of nonlinearities in compressed sensing. 1.
COBRA: A Nonlinear Aggregation Strategy
, 2013
"... A new method for combining several initial estimators of the regression function is introduced. Instead of building a linear or convex optimized combination over a collection of basic estimators r1,..., rM, we use them as a collective indicator of the distance between the training data and a test ob ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A new method for combining several initial estimators of the regression function is introduced. Instead of building a linear or convex optimized combination over a collection of basic estimators r1,..., rM, we use them as a collective indicator of the distance between the training data and a test observation. This local distance approach is modelfree and extremely fast. Most importantly, the resulting collective estimator is shown to perform asymptotically at least as well in the L2 sense as the best basic estimator in the collective. Moreover, it does so without having to declare which might be the best basic estimator for the given data set. A companion R package called COBRA (standing for COmBined Regression Alternative) is presented (downloadable on
Model Selection for Likelihoodfree Bayesian Methods Based on Moment Conditions: Theory and Numerical Examples. ArXiv eprints
, 2014
"... An important practice in statistics is to use robust likelihoodfree methods, such as the estimating equations, which only require assumptions on the moments instead of specifying the full probabilistic model. We propose a Bayesian flavored model selection approach for such likelihoodfree methods, ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
An important practice in statistics is to use robust likelihoodfree methods, such as the estimating equations, which only require assumptions on the moments instead of specifying the full probabilistic model. We propose a Bayesian flavored model selection approach for such likelihoodfree methods, based on (quasi)posterior probabilities from the Bayesian Generalized Method of Moments (BGMM). This novel concept allows us to incorporate two important advantages of a Bayesian approach: the expressiveness of posterior distributions and the convenient computational method of MCMC. Many different applications are possible, including modeling the correlated longitudinal data, the quantile regression, and the graphical models based on partial correlation. We demonstrate numerically how our method works in these applications. Under mild conditions, we show that theoretically the BGMM can achieve the posterior consistency for selecting the unknown true model, and that it possesses a Bayesian version of the oracle property, i.e. the posterior distribution for the parameter of interest is asymptotically normal and is as informative as if the true model were known. In addition, we show that the proposed quasiposterior is valid to be interpreted as an approximate conditional distribution given a data summary.
On the properties of variational approximations of Gibbs posteriors
, 2016
"... Abstract The PACBayesian approach is a powerful set of techniques to derive nonasymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately often intractable. One may sample from it using Markov chain Mont ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The PACBayesian approach is a powerful set of techniques to derive nonasymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately often intractable. One may sample from it using Markov chain Monte Carlo, but this is usually too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PACBayesian procedure it approximates. In addition, we show that, when the risk function is convex, a variational approximation can be obtained in polynomial time using a convex solver. We give finite sample oracle inequalities for the corresponding estimator. We specialize our results to several learning tasks (classification, ranking, matrix completion), discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets.
A. Garivier et al, Editors ON SOME RECENT ADVANCES ON HIGH DIMENSIONAL BAYESIAN STATISTICS
"... Abstract. This paper proposes to review some recent developments in Bayesian statistics for high dimensional data. After giving some brief motivations in a short introduction, we describe new advances in the understanding of Bayes posterior computation as well as theoretical contributions in non pa ..."
Abstract
 Add to MetaCart
Abstract. This paper proposes to review some recent developments in Bayesian statistics for high dimensional data. After giving some brief motivations in a short introduction, we describe new advances in the understanding of Bayes posterior computation as well as theoretical contributions in non parametric and high dimensional Bayesian approaches. From an applied point of view, we describe the socalled sqmc particle method to compute posterior Bayesian law, and provide a nonparametric analysis of the widespread abc method. On the theoretical side, we describe some recent advances in Bayesian consistency for a nonparametric hidden Markov model as well as new pacBayesian results for different models of high dimensional regression. Résumé. Nous proposons dans cet article une vue d’ensemble de récents développements en statistique bayésiennes en grande dimension. Après quelques motivations rappelées en introduction, nous présentons des avancées à la fois algorithmiques et dans la compréhension théorique de méthodes de calculs d’a posteriori bayésiens. En particulier, nous décrivons l’algorithme particulaire sqmc et proposons un point de vue nonparamétrique sur la méthode populaire abc. Nous revenons ensuite également sur des contributions nouvelles en statistiques bayésiennes non paramétriques et en grandes dimensions. Dans ce contexte, nous décrivons des résultats de consistance bayésienne a posteriori pour des modèles nonparamétriques de Markov cachés ainsi que des résultats pacbayésiens pour différents modèles de régression. 1.
Bayesian matrix completion: prior specification
"... Lowrank matrix estimation from incomplete measurements recently received increased attention due to the emergence of several challenging applications, such as recommender systems; see in particular the famous Netflix challenge. While the behaviour of algorithms based on nuclear norm minimization is ..."
Abstract
 Add to MetaCart
Lowrank matrix estimation from incomplete measurements recently received increased attention due to the emergence of several challenging applications, such as recommender systems; see in particular the famous Netflix challenge. While the behaviour of algorithms based on nuclear norm minimization is now well understood [SRJ05, SS05, CP09, CT09, CR09, Gro11, RT11, Klo11, KLT11], an as yet unexplored avenue of research is the behaviour of Bayesian algorithms in this context. In this paper, we briefly review the priors used in the Bayesian literature for matrix completion. A standard approach is to assign an inverse gamma prior to the singular values of a certain singular value decomposition of the matrix of interest; this prior is conjugate. However, we show that two other types of priors (again for the singular values) may be conjugate for this model: a gamma prior, and a discrete prior. Conjugacy is very convenient, as it makes it possible to implement either Gibbs sampling or Variational Bayes. Interestingly enough, the maximum a posteriori for these different priors is related to the nuclear norm minimization problems. Our main contribution is to prove the consistency of the posterior expectation when the discrete prior is used. We also compare all these priors on simulated datasets, and on the classical MovieLens and Netflix datasets. 1
A Bayesian Approach for Noisy Matrix Completion: Optimal Rate under General Sampling Distribution The
, 2014
"... Bayesian methods for lowrank matrix completion with noise have been shown to be very efficient computationally [3, 17, 18, 23, 26]. While the behaviour of penalized minimization methods is well understood both from the theoretical and computational points of view (see [7, 9, 16, 22] among others) i ..."
Abstract
 Add to MetaCart
(Show Context)
Bayesian methods for lowrank matrix completion with noise have been shown to be very efficient computationally [3, 17, 18, 23, 26]. While the behaviour of penalized minimization methods is well understood both from the theoretical and computational points of view (see [7, 9, 16, 22] among others) in this problem, the theoretical optimality of Bayesian estimators have not been explored yet. In this paper, we propose a Bayesian estimator for matrix completion under general sampling distribution. We also provide an oracle inequality for this estimator. This inequality proves that, whatever the rank of the matrix to be estimated, our estimator reaches the minimaxoptimal rate of convergence (up to a logarithmic factor). We end the paper with a short simulation study. 1