## Sparse Empirical Bayes Analysis (SEBA) (2010)

### BibTeX

@MISC{Bochkina10sparseempirical,

author = {Natalia Bochkina},

title = {Sparse Empirical Bayes Analysis (SEBA)},

year = {2010}

}

### OpenURL

### Abstract

We consider a joint processing of n independent sparse regression problems. Eachis based on a sample (yi1,xi1)...,(yim,xim) of m i.i.d. observationsfrom yi1 = x T i1 βi+εi1, yi1 ∈ R, xi1 ∈ R p, i = 1,...,n, and εi1 ∼ N(0,σ 2), say. p is large enough so that the empirical risk minimizer is not consistent. We consider three possible extensions of the lasso estimator to deal with this problem, the lassoes, the group lasso and the RING lasso, each utilizing a different assumption how these problems are related. For each estimator we give a Bayesian interpretation, and we present both persistency analysis and non-asymptotic error bounds based on restricted eigenvalue- type assumptions. “...and only a star or two set sparsedly in the vault of heaven; and you will find a sight as stimulating as the hoariest summit of the Alps. ” R. L. Stevenson 1

### Citations

329 |
Regression shrinkage and selection via the
- Tibshirani
- 1996
(Show Context)
Citation Context ... are related. They are considered as related only because our loss function is additive. One of the standard tools for finding sparse solutions in a large p small m situation is the lasso (Tibshirani =-=[13]-=-), and the methods we consider are its extensions. We will make use of the following notation. Introduce lp,q norm of a set of vectors z1,...,zn, not necessarily of the same length, zij, i = 1,...,n, ... |

325 | Exact matrix completion via convex optimization
- Candes, Recht
(Show Context)
Citation Context ... to this method as the rotation-invariant lasso, or shortly as the RING lasso. This is not surprising as under some conditions, this penalty also solves the minimum rank problem (see Candes and Recht =-=[4]-=-for thenoiselss case, andBach [1] forsome asymptotic results). By analogy with the lassoes argument, a higher power of the trace norm as a penalty may be more intuitive to a Bayesian. For both procedu... |

196 | On the distribution of the largest eigenvalue in principal component analysis - Johnstone - 2001 |

184 | Simultaneous analysis of Lasso and Dantzig selector
- Bickel, Ritov, et al.
(Show Context)
Citation Context ...me abuse of notation, a matrix ∆ = (∆1,...,∆n) may be considered as the vector (∆ T 1 ,...,∆T n) T . Finally, recall the notation B = (β1,...,βn) The restricted eigenvalue assumption of Bickel et al. =-=[2]-=- (and Lounici et al. [10]) can be generalized to incorporate unequal subsets Jis. In the assumption below, the restriction is given in terms of ℓq,1 norm, q � 1. Assumption REq(s,c0,κ). { ||XT∆||2 κ =... |

74 |
Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization
- Greenshtein, Ritov
(Show Context)
Citation Context ... is given by Minimize m∑ j=1 (yj −x T j β)2 +λ‖β‖ α 1 , where α can be any arbitrarily chosen positive number. In the literature one can find almost only α = 1. One exception is Greenshtein and Ritov =-=[5]-=- where α = 2 was found more natural, also it was just a matter of aesthetics. We would argue that α > 2 may be more intuitive. Our first algorithm generalizes this representation of the lasso directly... |

70 |
An empirical Bayes approach to statistics
- Robbins
- 1955
(Show Context)
Citation Context ...ℓ = hℓ(zij). This approach is in the spirit of the empirical Bayes approach (or compound decision theory, note however that the term “empirical Bayes” has a few other meanings in the literature), cf, =-=[11, 12, 8]-=-. The empirical Bayes to sparsity was considered before, e.g., [15, 3, 7, 6]. However, in these discussions the compound decision problem was within a single vector, while we consider the compound dec... |

49 |
de Geer. Taking advantage of sparsity in multi-task learning
- Lounici, Pontil, et al.
- 2009
(Show Context)
Citation Context ...hat in this case the sparsity pattern of variables is the same (with probability 1). Non-asymptotic inequalities under restricted eigenvalue type condition for group lasso are given by Lounici et al. =-=[10]-=-. Now, the standard notion of sparsity, as captured by the L0 norm, or by the standard lasso and group lasso, is basis dependent. Consider the model of (2). If, for example, g(z) = 1(a < z ≤ b), then ... |

40 | Consistency of trace norm minimization
- Bach
- 2008
(Show Context)
Citation Context ...i.d. random functions, and each group can be considered as m noisy observations, each one is on the value of these functions at a given value of the argument. Thus, yij = gi(zij)+εij, (2) where zij ∈ =-=[0,1]-=-. The model fits the regression setup of (1), if g(z) = ∑ p ℓ=1 βℓhℓ(p) where h1,...,hp are in L2(0,1), and xijℓ = hℓ(zij). This approach is in the spirit of the empirical Bayes approach (or compound ... |

40 | Asymptotically subminimax solutions of compound statistical decision problems - Robbins - 1951 |

18 | General empirical Bayes wavelet methods and exactly adaptive minimax estimation
- Zhang
- 2005
(Show Context)
Citation Context ...h (or compound decision theory, note however that the term “empirical Bayes” has a few other meanings in the literature), cf, [11, 12, 8]. The empirical Bayes to sparsity was considered before, e.g., =-=[15, 3, 7, 6]-=-. However, in these discussions the compound decision problem was within a single vector, while we consider the compound decision to be between the vectors, where the vectors are the basic units. The ... |

11 | Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of means
- Brown, Greenshtein
(Show Context)
Citation Context ...h (or compound decision theory, note however that the term “empirical Bayes” has a few other meanings in the literature), cf, [11, 12, 8]. The empirical Bayes to sparsity was considered before, e.g., =-=[15, 3, 7, 6]-=-. However, in these discussions the compound decision problem was within a single vector, while we consider the compound decision to be between the vectors, where the vectors are the basic units. The ... |

10 | Compound decision theory and empirical Bayes methods. Ann. Statist. 31 379–390. Dedicated to the memory of Herbert
- ZHANG
- 2003
(Show Context)
Citation Context ...ℓ = hℓ(zij). This approach is in the spirit of the empirical Bayes approach (or compound decision theory, note however that the term “empirical Bayes” has a few other meanings in the literature), cf, =-=[11, 12, 8]-=-. The empirical Bayes to sparsity was considered before, e.g., [15, 3, 7, 6]. However, in these discussions the compound decision problem was within a single vector, while we consider the compound dec... |

3 | Estimating the mean of high valued observations in high dimensions
- Greenshtein, Park, et al.
- 2008
(Show Context)
Citation Context ...h (or compound decision theory, note however that the term “empirical Bayes” has a few other meanings in the literature), cf, [11, 12, 8]. The empirical Bayes to sparsity was considered before, e.g., =-=[15, 3, 7, 6]-=-. However, in these discussions the compound decision problem was within a single vector, while we consider the compound decision to be between the vectors, where the vectors are the basic units. The ... |

1 |
Asymptotic efficiency of simple decisions forthecompounddecisionproblem
- Greenshtein, Ritov
- 2008
(Show Context)
Citation Context |