## Feature-inclusion stochastic search for gaussian graphical models (2007)

Citations: | 20 - 3 self |

### BibTeX

@TECHREPORT{Scott07feature-inclusionstochastic,

author = {G. Scott and Carlos M. Carvalho},

title = {Feature-inclusion stochastic search for gaussian graphical models},

institution = {},

year = {2007}

}

### OpenURL

### Abstract

We describe a serial algorithm called feature-inclusion stochastic search, or FINCS, that uses online estimates of edge-inclusion probabilities to inform the process of Bayesian model determination in Gaussian graphical models. FINCS is compared to Metropolis-based search methods and found to be superior along a variety of dimensions, leading to more accurate and less volatile estimates of edge-inclusion probabilities and greater speed in finding good models. Though FINCS is conceived as a method for characterizing model uncertainty in moderate-dimensional problems, we also find that it performs well as a stochastic hill-climber in bigger problems. We illustrate its use on an example involving mutual-fund data, where we compare the model-averaged predictive performance of models discovered with FINCS to those discovered with the Metropolis algorithm.

### Citations

2024 | Regression shrinkage and selection via the LASSO
- Tibshirani
- 1996
(Show Context)
Citation Context ... subset of {xj : j = i}. Dobra et al. (2004) perform a Bayesian selection procedure to get each neighborhood, while Meinshausen and Buhlmann (2006) and Yuan and Lin (2007) use variants of the lasso (=-=Tibshirani 1996-=-). Regardless of the variable-selection method used, the resulting set of regressions implicitly defines a graph. Such procedures do not, in general, yield a valid joint distribution: often i ∈ ne(j) ... |

915 | Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
- Green
- 1995
(Show Context)
Citation Context ...ty. Specifically, we seek to improve upon the standard Metropolis-Hastings algorithm. 8sGiudici and Green (1999) first applied these ideas to graphical models, implementing a reversible-jump sampler (=-=Green 1995-=-) over all model parameters including graphical constraints. Jones et al. (2005) considered a version of MCMC that eliminated the complexities of reversible-jump by explicitly marginalizing over most ... |

407 | High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics 34
- Meinshausen, Bühlmann
- 2006
(Show Context)
Citation Context ... nodes. At the high-dimensional end of the spectrum where there might be many thousands of variables, model-selection techniques based on L1-regularization allow for tractable non-Bayesian solutions (=-=Meinshausen and Buhlmann 2006-=-; Yuan and Lin 2007). The only Bayesian candidate for such problems can be found in Dobra et al. (2004), whose methods require a series of assumptions necessary for scalability 2sbut likely to yield s... |

361 | Variable selection via gibbs sampling - George, McCulloch - 1993 |

201 |
Covariance selection
- Dempster
- 1972
(Show Context)
Citation Context ... A sequence of subgraphs that cannot be decomposed further are the prime components of a graph; if every prime component is complete, the graph is said to be decomposable. A Gaussian graphical model (=-=Dempster 1972-=-) uses such a graphical structure to 3sdefine a set of pairwise conditional independence relationships on a p-dimensional normally distributed vector x ∼ N(0, Σ). With precision matrix Ω = Σ −1 , elem... |

133 | Sparse graphical models for exploring gene expression data - Dobra, Hans, et al. - 2004 |

125 | Calibration and Empirical BAyes variable selection - George, Foster - 2000 |

123 |
Hyper Markov laws in the statistical analysis of decomposable graphical models
- Dawid, Lauritzen
- 1993
(Show Context)
Citation Context .... The hyper-inverse Wishart distribution over a decomposable graph G is the unique strong hyper-Markov distribution for Σ ∈ M(G) with consistent clique marginals that are inverse Wishart distributed (=-=Dawid and Lauritzen 1993-=-). For a decomposable graph G, writing (Σ | G) ∼ HIWG(b, D) means first that the density of Σ decomposes as in (1). It also means that for each clique C, ΣC ∼ IW(b, DC) with density: p(ΣC | b, DC) ∝ |... |

119 | Model selection and estimation in the gaussian graphical model. Biometrika 94
- Yuan, Lin
- 2007
(Show Context)
Citation Context ...nal relationships into a graph to yield a valid joint distribution. Several methods for choosing each regression model are available, some based upon L1-regularization (Meinshausen and Buhlmann 2006; =-=Yuan and Lin 2007-=-) and others based upon stepwise selection (Dobra et al. 2004). Like MCMC, direct-search methods operate in the space of graphs rather than the space of conditional regressions. Unlike MCMC, however, ... |

113 | Fractional Bayes factors for model comparison (with discussion - O’Hagan - 1995 |

80 |
On assessing prior distributions and Bayesian regression analysis with g-prior distributions
- Zellner
- 1986
(Show Context)
Citation Context ... (2π) −np/2 h(G, gn, gH(X′ X)) h(G, n, H(X ′ X)) with h(G, b, D) defined as in (4), and for some suitable choice of g that is O(n −1 ). These marginal likelihoods are akin to using a set of g-priors (=-=Zellner 1986-=-) for doing variable selection on each univariate conditional regression, and have a number of a desirable properties relating to the notion of information consistency, as defined by Liang et al. (200... |

66 | Decomposable graphical Gaussian model determination. Biometrika 86: 785–801 - Giudici, Green - 1999 |

44 | Efficient estimation of covariance selection models - Wong, Carter, et al. - 2002 |

40 | Mixtures of g-priors for Bayesian Variable Selection, Unpublished Manuscript - Liang, Paulo, et al. - 2007 |

33 | Maximum cardinality search for computing minimal triangulations of graphs - Berry, Blair, et al. - 2004 |

32 |
Decomposition of maximum likelihood in mixed graphical interaction models
- Frydenberg, Lauritzen
- 1989
(Show Context)
Citation Context ...pwise-decomposable paths in model space between two far-flung graphs become vanishingly small as a proportion of all possible paths between them. The theory guarantees that such a path always exists (=-=Frydenberg and Lauritzen 1989-=-), but it may be very difficult to find. These principles suggest that a sound computational strategy must include a blend of local and global moves—local moves to explore concentrated regions of good... |

32 | An exploration of aspects of bayesian multiple testing - Scott, Berger - 2003 |

30 | EÆcient stepwise selection in decomposable models
- Deshpande, Garofalakis, et al.
- 2001
(Show Context)
Citation Context ...ric). If this fact is not accounted for, then the proposal densities h(·) in (9) will be wrong, and the chain will not converge to a stationary distribution. This enumeration is possible, but costly (=-=Deshpande et al. 2001-=-). The more important reason, however, is the lack of MCMC convergence diagnostics on such complex, multimodal problems. The model spaces are simply too large to trust the usual rules of thumb, and it... |

14 | Objective Bayesian model selection in Gaussian graphical models
- CARVALHO, SCOTT
- 2009
(Show Context)
Citation Context ...nal likelihoods, and indeed has been the prior of choice in many previous studies due to the lack of a reasonable alternative. Yet it tends to perform very poorly in contrast to the fractional prior (=-=Carvalho and Scott 2007-=-), leading to an unintuitively high level of uncertainty and an artificial flattening of modes in model space. Hence we use the fractional prior, with the understanding that its sharper characterizati... |

12 | Empirical Bayes vs. Fully Bayes variable selection - Cui, George - 2008 |

10 | Posterior model probabilities via path-based pairwise priors - Berger, Molina - 2005 |

10 | A vertex incremental approach for maintaining chordality - Berry, Heggernes, et al. |

10 | Dynamic matrix-variate graphical models,” Bayesian Analysis 2 - Carvalho, West - 2007 |

7 |
Shotgun stochastic search for regression with many candidate predictors
- Hans, Dobra, et al.
- 2007
(Show Context)
Citation Context ...2004), whose methods require a series of assumptions necessary for scalability 2sbut likely to yield suboptimal performance on smaller problems. Parallel stochastic search methods (Jones et al. 2005; =-=Hans et al. 2007-=-) offer a solution in moderate dimensions, yet are prohibitive for those without access to a large computing cluster. We propose FINCS as a serial algorithm for filling this gap, and proceed as follow... |

6 | Optimal Predictive Model Selection,” The Annals of Statistics - Barbieri, Berger - 2004 |

4 |
Conditions for rapid and torpid mixing of parallel and simulated tempering on multimodal distributions
- WOODARD
- 2007
(Show Context)
Citation Context ...authors note, assessing whether a Markov chain over a multimodal space has converged to a stationary distribution is devilishly tough, and theoretical results exist only for the smallest of problems (=-=Woodard 2007-=-). Even when state-of-the-art “mixing” tactics are used—simulated tempering, parallel chains, adaptive proposal distributions—apparent finite-time convergence can prove to be a mirage. Two alternative... |