Results 1 - 10
of
284
Bayesian inference and optimal design in the sparse linear model
- Workshop on Artificial Intelligence and Statistics
"... The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of ..."
Abstract
-
Cited by 111 (13 self)
- Add to MetaCart
(Show Context)
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).
The Horseshoe Estimator for Sparse Signals
, 2008
"... This paper proposes a new approach to sparsity called the horseshoe estimator. The horseshoe is a close cousin of other widely used Bayes rules arising from, for example, double-exponential and Cauchy priors, in that it is a member of the same family of multivariate scale mixtures of normals. But th ..."
Abstract
-
Cited by 77 (13 self)
- Add to MetaCart
This paper proposes a new approach to sparsity called the horseshoe estimator. The horseshoe is a close cousin of other widely used Bayes rules arising from, for example, double-exponential and Cauchy priors, in that it is a member of the same family of multivariate scale mixtures of normals. But the horseshoe enjoys a number of advantages over existing approaches, including its robustness, its adaptivity to different sparsity patterns, and its analytical tractability. We prove two theorems that formally characterize both the horseshoe’s adeptness at large outlying signals, and its super-efficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using a combination of real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers one would get by pursuing a full Bayesian model-averaging approach using a discrete mixture prior to model signals and noise.
A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms
- Management Science
"... doi 10.1287/mnsc.1080.0986 ..."
Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction
, 2010
"... We use Lévy processes to generate joint prior distributions for a location parameter β = (β1,..., βp) as p grows large. This approach, which generalizes normal scale-mixture priors to an infinite-dimensional setting, has a number of connections with mathematical finance and Bayesian nonparametrics. ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
We use Lévy processes to generate joint prior distributions for a location parameter β = (β1,..., βp) as p grows large. This approach, which generalizes normal scale-mixture priors to an infinite-dimensional setting, has a number of connections with mathematical finance and Bayesian nonparametrics. We argue that it provides an intuitive framework for generating new regularization penalties and shrinkage rules; for performing asymptotic analysis on existing models; and for simplifying proofs of some classic results on normal scale mixtures.
Bayesian Robust Principal Component Analysis
, 2010
"... A hierarchical Bayesian model is considered for decomposing a matrix into low-rank and sparse components, assuming the observed matrix is a superposition of the two. The matrix is assumed noisy, with unknown and possibly non-stationary noise statistics. The Bayesian framework infers an approximate r ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
A hierarchical Bayesian model is considered for decomposing a matrix into low-rank and sparse components, assuming the observed matrix is a superposition of the two. The matrix is assumed noisy, with unknown and possibly non-stationary noise statistics. The Bayesian framework infers an approximate representation for the noise statistics while simultaneously inferring the low-rank and sparse-outlier contributions; the model is robust to a broad range of noise levels, without having to change model hyperparameter settings. In addition, the Bayesian framework allows exploitation of additional structure in the matrix. For example, in video applications each row (or column) corresponds to a video frame, and we introduce a Markov dependency between consecutive rows in the matrix (corresponding to consecutive frames in the video). The properties of this Markov process are also inferred based on the observed matrix, while simultaneously denoising and recovering the low-rank and sparse components. We compare the Bayesian model to a state-of-the-art optimization-based implementation of robust PCA; considering several examples, we demonstrate competitive performance of the proposed model.
2011): “Variance estimation using refitted cross-validation in ultrahigh dimensional regression,” forthcoming
- INFERENCE AFTER MODEL SELECTION 59
"... ar ..."
Bayesian generalized double Pareto shrinkage
, 2010
"... We propose a generalized double Pareto prior for shrinkage estimation in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, while forming a bridge between the Laplace and Normal-Jeffreys ’ priors. While it has a spike at zero like the Laplace density, it ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
We propose a generalized double Pareto prior for shrinkage estimation in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, while forming a bridge between the Laplace and Normal-Jeffreys ’ priors. While it has a spike at zero like the Laplace density, it also has a Student-t-like tail behavior. We show strong consistency of the posterior in regression models with a diverging number of parameters, providing a template to be used for other priors in similar settings. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We also investigate the properties of the maximum a posteriori estimator and reveal connections with some well-established regularization procedures. The performance of the new prior is tested through simulations.
VAR forecasting using Bayesian variable selection
- Journal of Applied Econometrics
, 2012
"... VAR forecasting using Bayesian variable selection ..."
(Show Context)
Adaptive MultiTask Lasso: with Application to eQTL Detection.
- The 24th Annual Conference on Neural Information Processing Systems (NIPS),
, 2010
"... Abstract To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of gene ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
(Show Context)
Abstract To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs.
Compressibility of Deterministic and Random Infinite Sequences
"... Abstract—We introduce a definition of the notion of compressibility for infinite deterministic and i.i.d. random sequences which is based on the asymptotic behavior of truncated subsequences. For this purpose, we use asymptotic results regarding the distribution of order statistics for heavy-tail di ..."
Abstract
-
Cited by 18 (15 self)
- Add to MetaCart
Abstract—We introduce a definition of the notion of compressibility for infinite deterministic and i.i.d. random sequences which is based on the asymptotic behavior of truncated subsequences. For this purpose, we use asymptotic results regarding the distribution of order statistics for heavy-tail distributions and their link with-stable laws for.Inmanycases,ourproposeddefinition of compressibility coincides with intuition. In particular, we prove that heavy-tail (polynomial decaying) distributions fulfill the requirements of compressibility. On the other hand, exponential decaying distributions like Laplace and Gaussian do not. The results are such that two compressible distributions can be compared with each other in terms of their degree of compressibility. Index Terms—Compressible prior, heavy-tail distribution, order statistics, stable law. I.