Results 1  10
of
21
Sparse Regression Learning by Aggregation and Langevin MonteCarlo
, 2009
"... We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PACBayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PACBayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter β of the EWA is larger than or equal to 4σ 2, where σ 2 is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finitedimensional linear space spanned by a dictionary of functions φ1,...,φM. We allow M to be much larger than the sample size n but we assume that the true regression function can be well approximated by a sparse linear combination of functions φj. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin MonteCarlo algorithms to approximately compute such an EWA when the number M of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.
Sparse recovery in convex hulls via entropy penalization, 2008
 Ann. Statist
"... Let (X,Y) be a random couple in S × T with unknown distribution P and (X1,Y1),...,(Xn,Yn) be i.i.d. copies of (X,Y). Denote Pn the empirical distribution of (X1,Y1),...,(Xn,Yn). Let h1,...,hN:S ↦→ [−1,1] be a dictionary that consists of N functions. For λ ∈ R N, denote fλ: = ∑N λjhj. Let ℓ:T × R ↦ → ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Let (X,Y) be a random couple in S × T with unknown distribution P and (X1,Y1),...,(Xn,Yn) be i.i.d. copies of (X,Y). Denote Pn the empirical distribution of (X1,Y1),...,(Xn,Yn). Let h1,...,hN:S ↦→ [−1,1] be a dictionary that consists of N functions. For λ ∈ R N, denote fλ: = ∑N λjhj. Let ℓ:T × R ↦ → R be a given loss function j=1 and suppose it is convex with respect to the second variable. Let (ℓ • f)(x,y): = ℓ(y;f(x)). Finally, let Λ ⊂ R N be the simplex of all probability distributions on {1,...,N}. Consider the following penalized empirical risk minimization problem N∑ Pn(ℓ • fλ) + ε ˆλ ε: = argmin λ∈Λ j=1 λj log λj along with its distribution dependent version N∑ P(ℓ • fλ) + ε λj logλj, λ ε: = argmin λ∈Λ where ε ≥ 0 is a regularization parameter. It is proved that the “approximate sparsity ” of λ ε implies the “approximate sparsity ” of ˆ λ ε and the impact of “sparsity ” on bounding the excess risk of the empirical solution is explored. Similar results are also discussed in the case of entropy penalized density estimation.
1 ClosedForm MMSE Estimation for Signal Denoising Under Sparse Representation Modeling Over a Unitary Dictionary
"... This paper deals with the Bayesian signal denoising problem, assuming a prior based on a sparse representation modeling over a unitary dictionary. It is well known that the Maximum Aposteriori Probability (MAP) estimator in such a case has a closedform solution based on a simple shrinkage. The foc ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper deals with the Bayesian signal denoising problem, assuming a prior based on a sparse representation modeling over a unitary dictionary. It is well known that the Maximum Aposteriori Probability (MAP) estimator in such a case has a closedform solution based on a simple shrinkage. The focus in this paper is on the better performing and less familiar MinimumMeanSquaredError (MMSE) estimator. We show that this estimator also leads to a simple formula, in the form of a plain recursive expression for evaluating the contribution of every atom in the solution. An extension of the model to realworld signals is also offered, considering heteroscedastic nonzero entries in the representation, and allowing varying probabilities for the chosen atoms and the overall cardinality of the sparse representation. The MAP and MMSE estimators are redeveloped for this extended model, again resulting in closedform simple algorithms. Finally, the superiority of the MMSE estimator is demonstrated both on synthetically generated signals and on realworld signals (image patches).
Regularization with the smoothlasso procedure
, 2008
"... We consider the linear regression problem. We propose the SLasso procedure to estimate the unknown regression parameters. This estimator enjoys sparsity of the representation while taking into account correlation between successive covariates (or predictors). The study covers the case when p ≫ n, i ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We consider the linear regression problem. We propose the SLasso procedure to estimate the unknown regression parameters. This estimator enjoys sparsity of the representation while taking into account correlation between successive covariates (or predictors). The study covers the case when p ≫ n, i.e. the number of covariates is much larger than the number of observations. In the theoretical point of view, for fixed p, we establish asymptotic normality and consistency in variable selection results for our procedure. When p ≥ n, we provide variable selection consistency results and show that the SLasso achieved a Sparsity Inequality, i.e., a bound in term of the number of nonzero components of the oracle vector. It appears that the SLasso has nice variable selection properties compared to its challengers. Furthermore, we provide an estimator of the effective degree of freedom of the SLasso estimator. A simulation study shows that the SLasso performs better than the Lasso as far as variable selection is concerned especially when high correlations between successive covariates exist. This procedure also appears to be a good challenger to the ElasticNet [36].
Onlinetoconfidenceset conversions and application to sparse stochastic bandits
 In Conference on Artificial Intelligence and Statistics (AISTATS
, 2012
"... We introduce a novel technique, which we call onlinetoconfidenceset conversion. The technique allows us to construct highprobability confidence sets for linear prediction with correlated inputs given the predictions of any algorithm (e.g., online LASSO, exponentiated gradient algorithm, online le ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We introduce a novel technique, which we call onlinetoconfidenceset conversion. The technique allows us to construct highprobability confidence sets for linear prediction with correlated inputs given the predictions of any algorithm (e.g., online LASSO, exponentiated gradient algorithm, online leastsquares, pnorm algorithm) targeting online learning with linear predictors and the quadratic loss. By construction, the size of the confidence set is directly governed by the regret of the online learning algorithm. Constructing tight confidence sets is interesting on its own, but the new technique is given extra weight by the fact having access tight confidence sets underlies a number of important problems. The advantage of our construction here is that progress in constructing better algorithms for online prediction problems directly translates into tighter confidence sets. In this paper, this is demonstrated in the case of linear stochastic bandits. In particular, we introduce the sparse variant of linear stochastic bandits and show that a recent online algorithm together with our onlinetoconfidenceset conversion allows one to derive algorithms that can exploit if the reward is a function of a sparse linear combination of the components of the chosen action.
NLMEANS AND AGGREGATION PROCEDURES
"... Patch based denoising methods, such as the NLMeans, have emerged recently as simple and efficient denoising methods. This paper provides a new insight on those methods by showing their connection with recent statistical aggregation techniques. Within this aggregation framework, we propose some nove ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Patch based denoising methods, such as the NLMeans, have emerged recently as simple and efficient denoising methods. This paper provides a new insight on those methods by showing their connection with recent statistical aggregation techniques. Within this aggregation framework, we propose some novel patch based denoising methods. We provide some theoretical justification and then explain how to implement them with a Monte Carlo based algorithm.
Hypersparse optimal aggregation
, 2010
"... Given a finite set F of functions and a learning sample, the aim of an aggregation procedure is to have a risk as close as possible to risk of the best function in F. Up to now, optimal aggregation procedures are convex combinations of every elements of F. In this paper, we prove that optimal aggreg ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Given a finite set F of functions and a learning sample, the aim of an aggregation procedure is to have a risk as close as possible to risk of the best function in F. Up to now, optimal aggregation procedures are convex combinations of every elements of F. In this paper, we prove that optimal aggregation procedures combining only two functions in F exist. Such algorithms are of particular interest when F contains many irrelevant functions that should not appear in the aggregation procedure. Since selectors are suboptimal aggregation procedures, this proves that two is the minimal number of elements of F required for the construction of an optimal aggregation procedure in every situations. Then, we perform a numerical study for the problem of selection of the regularization parameters of the Lasso and the Elasticnet estimators. We compare on simulated examples our aggregation algorithms to aggregation with exponential weights, to Mallow’s Cp and to crossvalidation selection procedures.
On the optimality of the empirical risk minimization procedure for the Convex Aggregation problem
, 2011
"... We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class F. We show that ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class F. We show that ERM performed in the convex hull of F is an optimal aggregation procedure for the convex aggregation problem. We also show that if this procedure is used for the problem of model selection aggregation, in which one wants to mimic the performance of the best function in F itself, then its rate is the same as the one achieved for the convex aggregation problem, and thus is far from optimal. These results are obtained in deviation and are sharp up to logarithmic factors. 1 Introduction and main results In this note, we study the optimality of the empirical risk minimization procedure in the aggregation framework. Let X be a probability space and let (X, Y) and (X1, Y1),..., (Xn, Yn) be n + 1
An aggregator point of view on NLMeans
"... Patch based methods give some of the best denoising results. Their theoretical performances are still unexplained mathematically. We propose a novel insight of NLMeans based on an aggregation point of view. More precisely, we describe the framework of PACBayesian aggregation, show how it allows to ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Patch based methods give some of the best denoising results. Their theoretical performances are still unexplained mathematically. We propose a novel insight of NLMeans based on an aggregation point of view. More precisely, we describe the framework of PACBayesian aggregation, show how it allows to derive some new patch based methods and to characterize their theoretical performances, and present some numerical experiments. Keywords: Denoising, NLMeans, Aggregation, PACBayesian, Patch Some of the best denoising results are obtained by the patch based NLmeans method proposed by Buades et al. 1 or by some of its variants. 2 These methods are based on a simple idea: consider the image not as a collection of pixels but as a collection of subimages, the “patches”, centered on those pixels and estimate each patch as a weighted average of patches. These weights take into account the similarities of the patches and are often chosen proportional to the exponential of the quadratic difference between the patches with a renormalization so they sum to 1. Understanding why these methods are so efficient is a challenging task. In their seminal paper, Buades et al. 1 show the consistency of their method under a strong technical βmixing assumption on the image. NLMeans methods can also be seen as a smoothing in a patch space with a Gaussian kernel and their performances are related to the regularity of the underlying patch manifold (see for instance
How to SAIFly Boost Denoising Performance
, 2012
"... Spatial domain image filters (e.g. bilateral filter, NLM, LARK) have achieved great success in denoising. However, their overall performance has not generally surpassed the leading transform domain based filters (such as BM3D). One important reason is that spatial domain filters lack an efficient wa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Spatial domain image filters (e.g. bilateral filter, NLM, LARK) have achieved great success in denoising. However, their overall performance has not generally surpassed the leading transform domain based filters (such as BM3D). One important reason is that spatial domain filters lack an efficient way to adaptively fine tune their denoising strength; something that is relatively easy to do in transform domain method with shrinkage operators. In the pixel domain, the smoothing strength is usually controlled globally by, for example, tuning a regularization parameter. In this paper, we propose SAIF 1 (Spatially Adaptive Iterative Filtering), a new strategy to control the denoising strength locally for any spatial domain method. This approach is capable of filtering local image content iteratively using the given base filter, while the type of iteration and the iteration number are automatically optimized with respect to estimated risk (i.e. meansquared error). In exploiting the estimated local SNR, we also present a new risk estimator which is different than the oftenemployed SURE method and exceeds its performance in many cases. Experiments illustrate that our strategy can significantly relax the base algorithm’s sensitivity to its tuning (smoothing) parameters, and effectively boost the performance of several existing denoising filters to generate stateoftheart results under both simulated and practical conditions.