Results 1 - 10
of
12
Mixtures of g-priors for Bayesian variable selection
- Journal of the American Statistical Association
, 2008
"... Zellner’s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of g-priors as an alternative to default g-priors that resolve many of the problems with the original formulation, while mai ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Zellner’s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of g-priors as an alternative to default g-priors that resolve many of the problems with the original formulation, while maintaining the computational tractability that has made the g-prior so popular. We present theoretical properties of the mixture g-priors and provide real and simulated examples to compare the mixture formulation with fixed g-priors, Empirical Bayes approaches and other default procedures.
Fast Bayesian Matching Pursuit: Model Uncertainty and Parameter Estimation for Sparse Linear Models
- IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 2009
"... A low-complexity recursive procedure is presented for model selection and minimum mean squared error (MMSE) estimation in linear regression. Emphasis is given to the case of a sparse parameter vector and fewer observations than unknown parameters. A Gaussian mixture is chosen as the prior on the un ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
A low-complexity recursive procedure is presented for model selection and minimum mean squared error (MMSE) estimation in linear regression. Emphasis is given to the case of a sparse parameter vector and fewer observations than unknown parameters. A Gaussian mixture is chosen as the prior on the unknown parameter vector. The algorithm returns both a set of high posterior probability mixing parameters and an approximate MMSE estimate of the parameter vector. Exact ratios of posterior probabilities serve to reveal potential ambiguity among multiple candidate solutions that are ambiguous due to observation noise or correlation among columns in the regressor matrix. Algorithm complexity is linear in the number of unknown coefficients, the number of observations and the number of nonzero coefficients. If hyperparameters are unknown, a maximum likelihood estimate is found by a generalized expectation maximization algorithm. Numerical simulations demonstrate estimation performance and illustrate the distinctions between MMSE estimation and maximum a posteriori probability model selection.
Computational advances for and from Bayesian analysis
- Statist. Sci
, 2004
"... Abstract. The emergence in the past years of Bayesian analysis in many methodological and applied fields as the solution to the modeling of complex problems cannot be dissociated from major changes in its computational implementation. We show in this review how the advances in Bayesian analysis and ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. The emergence in the past years of Bayesian analysis in many methodological and applied fields as the solution to the modeling of complex problems cannot be dissociated from major changes in its computational implementation. We show in this review how the advances in Bayesian analysis and statistical computation are intermingled. Key words and phrases: Monte Carlo methods, importance sampling, Markov chain Monte Carlo (MCMC) algorithms.
The Strength of Statistical Evidence for Composite Hypotheses: Inference to the Best Explanation
, 2010
"... A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors o ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with a reduced likelihood function on the interest parameter space. Unlike the Bayes factor and unlike the p-value under interpretations that extend its scope, the weight of evidence is coherent in the sense that it cannot support a hypothesis over any hypothesis that it entails. Further, when comparing the hypothesis that the parameter lies outside a non-trivial interval to the hypothesis that it lies within the interval, the proposed method of weighing evidence almost always asymptotically favors the correct hypothesis
On communication over unknown sparse frequency-selective block-fading channels,” arXiv:1006.1548
, 2010
"... Abstract—This paper considers the problem of reliable communication over discrete-time channels whose impulse responses have length and exactly non-zero coefficients, and whose support and coefficients remain fixed over blocks of channel uses but change independently from block to block. Here, it is ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract—This paper considers the problem of reliable communication over discrete-time channels whose impulse responses have length and exactly non-zero coefficients, and whose support and coefficients remain fixed over blocks of channel uses but change independently from block to block. Here, it is assumed that the channel’s support and coefficient realizations are both unknown, although their statistics are known. Assuming Gaussian non-zero-coefficients and noise, and focusing on the high-SNR regime, it is first shown that the ergodic noncoherent channel capacity has pre-log factor 1 for any. It is then shown that, to communicate with arbitrarily small error probability at rates in accordance with the capacity pre-log factor, it suffices to use pilot-aided orthogonal frequency-division multiplexing (OFDM) with pilots per fading block, in conjunction with an appropriate noncoherent decoder. Since the achievability result is proven using a noncoherent decoder whose complexity grows exponentially in the number of fading blocks, a simpler decoder, based on +1pilots, is also proposed. Its-achievable +1 rate is shown to have pre-log factor equal to 1 with the previously considered channel, while its achievable rate is shown to +1 have pre-log factor 1 when the support of the block-fading channel remains fixed over time. Index Terms—Bayes model averaging, compressed sensing, fading channels, noncoherent capacity, noncoherent communication, sparse channels. I.
A conjugate prior for discrete hierarchical loglinear models. Available from http://arxiv.org/abs/0711.1609
, 2008
"... In Bayesian analysis of multi-way contingency tables, the selection of a prior distribution for either the log-linear parameters or the cell probabilities parameters is a major challenge. In this paper, we define a flexible family of conjugate priors for the wide class of discrete hierarchical log-l ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In Bayesian analysis of multi-way contingency tables, the selection of a prior distribution for either the log-linear parameters or the cell probabilities parameters is a major challenge. In this paper, we define a flexible family of conjugate priors for the wide class of discrete hierarchical log-linear models, which includes the class of graphical models. These priors are defined as the Diaconis–Ylvisaker conjugate priors on the log-linear parameters subject to “baseline constraints ” under multinomial sampling. We also derive the induced prior on the cell probabilities and show that the induced prior is a generalization of the hyper Dirichlet prior. We show that this prior has several desirable properties and illustrate its usefulness by identifying the most probable decomposable, graphical and hierarchical log-linear models for a six-way contingency table. 1. Introduction. We
Credal Model Averaging: an Extension of Bayesian Model Averaging to Imprecise Probabilities
"... Abstract. We deal with the arbitrariness in the choice of the prior over the models in Bayesian model averaging (BMA), by modelling prior knowledge by a set of priors (i.e., a prior credal set). We consider Dash and Cooper’s BMA applied to naive Bayesian networks, replacing the single prior over the ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. We deal with the arbitrariness in the choice of the prior over the models in Bayesian model averaging (BMA), by modelling prior knowledge by a set of priors (i.e., a prior credal set). We consider Dash and Cooper’s BMA applied to naive Bayesian networks, replacing the single prior over the naive models by a credal set; this models a condition close to prior ignorance about the models, which leads to credal model averaging (CMA). CMA returns an indeterminate classification, i.e., multiple classes, on the instances for which the learning set is not informative enough to smooth the effect of the choice of the prior. We give an algorithm to compute exact credal model averaging for naive networks. Extensive experiments show that indeterminate classifications preserve the reliability of CMA on the instances which are classified in a prior-dependent way by BMA.
Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification
, 2010
"... Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided fo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided for quantification. This requirement is sometimes too narrow. Sets instead of single distributions can provide a more realistic description in these cases. Bayesian networks can be generalized to cope with sets of distributions. This leads to a novel class of imprecise probabilistic graphical models, called credal networks. In particular, classifiers based on Bayesian networks are generalized to so-called credal classifiers. Unlike Bayesian classifiers, which always detect a single class as the one maximizing the posterior class probability, a credal classifier may eventually be unable to discriminate a single class. In other words, if the available information is not sufficient, credal classifiers allow for indecision between two or more classes, this providing a less informative but more robust conclusion than Bayesian classifiers.
MODEL SELECTION, COVARIANCE SELECTION AND BAYES CLASSIFICATION VIA SHRINKAGE
, 2006
"... The naive Bayes classifier (NB) has exhibited its “mysterious ” but outstanding classification ability in practice, in spite of its often unrealistic conditional inde-pendence assumption. This simple assumption implies the adoption of a diagonal structure for the underlying class-specific precision ..."
Abstract
- Add to MetaCart
The naive Bayes classifier (NB) has exhibited its “mysterious ” but outstanding classification ability in practice, in spite of its often unrealistic conditional inde-pendence assumption. This simple assumption implies the adoption of a diagonal structure for the underlying class-specific precision matrices. However, the NB leaves covariates interrelationships unrevealed. In this dissertation, we will ex-tend the NB from the perspectives of covariance modeling and classification. Due to the positive definiteness constraint and the rapidly-growing number of parameters with dimensions, covariance estimation in a multivariate normal population has been a classic but challenging statistical problem. Sparse shrinkage covariance/precision matrix estimation has been obeyed as an important principle in covariance/precision matrix modeling. However, many existing models can only shrink the covariance/precision matrix toward a predefined diagonal structure. We model a precision matrix via its Cholesky decomposition in terms of compositional regression coefficient matrix and error precisions. Our approach aims at estimating
A Fast Posterior Update for Sparse Underdetermined Linear Models
"... Abstract — A Bayesian approach is adopted for linear regression, and a fast algorithm is given for updating posterior probabilities. Emphasis is given to the underdetermined and sparse case, i.e., fewer observations than regression coefficients and the belief that only a few regression coefficients ..."
Abstract
- Add to MetaCart
Abstract — A Bayesian approach is adopted for linear regression, and a fast algorithm is given for updating posterior probabilities. Emphasis is given to the underdetermined and sparse case, i.e., fewer observations than regression coefficients and the belief that only a few regression coefficients are non-zero. The fast update allows for a low-complexity method of reporting a set of models with high posterior probability and their exact posterior odds. As a byproduct, this Bayesian model averaged approach yields the minimum mean squared error estimate of unknown coefficients. Algorithm complexity is linear in the number of unknown coefficients, the number of observations and the number of nonzero coefficients. For the case in which hyperparameters are unknown, a maximum likelihood estimate is found by a generalized expectation maximization algorithm. I.

