## Prediction via Orthogonalized Model Mixing (1994)

Venue: | JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION |

Citations: | 50 - 9 self |

### BibTeX

@ARTICLE{Clyde94predictionvia,

author = {Merlise Clyde and Heather DeSimone and Giovanni Parmigiani},

title = {Prediction via Orthogonalized Model Mixing},

journal = {JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION},

year = {1994},

volume = {91},

pages = {1197--1208}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in terms of an orthogonalization of the design matrix. Advantages are both statistical and computational. Statistically, orthogonalization often leads to a reduction in the number of competing models by eliminating correlations. Computationally, large model spaces cannot be enumerated; recent approaches are based on sampling models with high posterior probability via Markov chains. Based on orthogonalization of the space of candidate predictors, we can approximate the posterior probabilities of models by products of predictor-specific terms. This leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorting to Markov chains. Comp...

### Citations

331 |
Variable Selection via Gibbs Sampling
- George, McCulloch
- 1993
(Show Context)
Citation Context ...ability, without enumerating the model space. For example, variable selection methods based on sampling from the model space using Markov chains are the Stochastic Search Variable Selection, or SSVS (=-=George and McCulloch 1993-=-, 1994), the Markov chain Monte Carlo Model Composition, or MC 3 (Madigan and York 1993) and the methods of Carlin and Chib (1995), Geweke (1994), Phillips and Smith (1994) and Green (1995). A determi... |

265 | Models selection and accounting for model uncertainty in graphical models using occam’s window
- Madigan, Raftery
- 1994
(Show Context)
Citation Context ...o Model Composition, or MC 3 (Madigan and York 1993) and the methods of Carlin and Chib (1995), Geweke (1994), Phillips and Smith (1994) and Green (1995). A deterministic algorithm is Occam's window (=-=Madigan and Raftery 1994-=-). The resulting collection of models, or sometimes a further subset, can then be used for model mixing. Examples include applications of Occam's Window and MC 3 to linear models (Raftery, Madigan and... |

226 | Bayesian graphical models for discrete data - Madigan, York - 1995 |

205 | prévision: ses lois logiques, ses sources subjectives, Annales de l’Institut H. Poincaré 7 - Finetti, La - 1937 |

188 |
Sliced inverse regression for dimension reduction (with discussion
- Li
- 1991
(Show Context)
Citation Context ...o achieve closeness to "target" or "optimal" subspaces. This may be better achieved by an orthogonalization based on Y , such as partial least squares (Wold et al. 1984) or sliced =-=inverse regression (Li 1991-=-). This, however, means introducing uncertainty due to sampling variation in the orthogonal basis, and, in this setup, data dependence in the prior distribution on ff. In Clyde and Parmigiani (1995) w... |

149 |
Bayesian Model Choice via Markov Chain Monte Carlo
- Carlin, Chib
- 1995
(Show Context)
Citation Context ...1. Monte Carlo estimator. The weight w fl is the relative frequency f fl =N of model fl in the N draws. This approach is appropriate for Markov chain output when q fl are not available (Geweke, 1994, =-=Carlin and Chib, 1995-=-). In our formulation, using a simple Monte-Carlo average ignores the information contained in q fl and in the sampling mechanism. As a result, one can construct more efficient estimators. 2. Window e... |

123 | Reversible jump MCMC computation and Bayesian model determination - Green - 1995 |

111 | Assessment and propagation of model uncertainty (with discussion - Draper - 1995 |

102 | J.J.: Bayesian variable selection in linear regression - Mitchell, Beauchamp - 1988 |

73 | Bayesian Model Comparison via Jump Diffusions - Phillips, Smith - 1996 |

57 | Bayesian variable selection with related predictors,” The Canadian - Chipman - 1996 |

47 | Model selection and accounting for model uncertainty in linear regression models - Raftery, Madigan, et al. - 1997 |

39 | Accounting for model uncertainty in survival analysis improves predictive performance (with discussion - Raftery, E, et al. - 1996 |

35 | Bayesian comparison of econometric models
- Geweke
- 1994
(Show Context)
Citation Context ...e as follows: 1. Monte Carlo estimator. The weight w fl is the relative frequency f fl =N of model fl in the N draws. This approach is appropriate for Markov chain output when q fl are not available (=-=Geweke, 1994-=-, Carlin and Chib, 1995). In our formulation, using a simple Monte-Carlo average ignores the information contained in q fl and in the sampling mechanism. As a result, one can construct more efficient ... |

22 |
Minimax multiple shrinkage estimation
- George
- 1986
(Show Context)
Citation Context ...y assessment, as well as ways of incorporating information from all predictors without over fitting the data. The latter is achieved by a data-based shrinkage of the regression coefficients (see also =-=George, 1986-=-a, 1986b). In this paper we propose to approach model mixing by expressing the model space in terms of an orthogonal transformation of the matrix of predictors. This strategy defines a new class of mi... |

22 | A.: Eliciting prior information to enhance the predictive performance of Bayesian graphical models - Madigan, Garvin, et al. - 1995 |

19 | Spatial Applications of Markov Chain Monte Carlo for Bayesian Inference - Higdon - 1994 |

19 |
Applied Linear Regression, 2 nd Edition
- Weisberg
- 1985
(Show Context)
Citation Context ...hed the mass media. In such complex modeling problems, predictions based on choosing a single model are often not satisfactory, a fact that has been long recognized in the literature (see for example =-=Weisberg, 1985-=-). Bayesian methods offer a very effective and conceptually appealing alternative: predictions can be based on a set of plausible models rather than a single model; each model contributes to the predi... |

17 | Elicitation of Prior Distributions for Variable-Selection Problems in Regression Annals of Statistics - Garthwaite, Dickey - 1992 |

15 |
Combining minimax shrinkage estimators
- George
- 1986
(Show Context)
Citation Context ...y assessment, as well as ways of incorporating information from all predictors without over fitting the data. The latter is achieved by a data-based shrinkage of the regression coefficients (see also =-=George, 1986-=-a, 1986b). In this paper we propose to approach model mixing by expressing the model space in terms of an orthogonal transformation of the matrix of predictors. This strategy defines a new class of mi... |

14 | Participation in illegitimate activities: Ehrlich revisited - Vandaele - 1978 |

13 | Two approaches to Bayesian model selections with applications - George, McCulloch, et al. - 1996 |

5 | Discovery sampling and selection models
- West
- 1994
(Show Context)
Citation Context ...hought of as a sample without replacement from a finite population, with sampling proportional to the size of ~ ��. Methods for analyzing similar data are discussed in the literature (see for exam=-=ple West, 1994-=-) and can be used to derive posterior distributions for OE based on the discovered models. These typically require additional simulation to make inference about OE. In this context we seek approaches ... |

4 | Space filling experimental design for determining protein construct storage conditions - Menius, Rocque, et al. - 1994 |

3 |
The Use and Interpretation of Principal Components in Applied Research. Sankhya A 26
- Rao
- 1964
(Show Context)
Citation Context ...esent several alternative orthogonalization strategies in detail. Further discussion is also in Section 7. The remainder of this paper is based on constructing W via generalized principal components (=-=Rao 1964-=-), a well understood basis, for which computing routines are readily available. The resulting orthogonal variables are invariant under reordering and rescaling the original predictors, and do not requ... |

2 |
Covariate modelling in population pharmacokinetics models
- Bennett, Wakefield
- 1994
(Show Context)
Citation Context ... and applications of SSVS to designed experiments (Chipman, 1994 and Clyde and Parmigiani, 1994) generalized linear models (George, McCulloch and Tsay 1994) and population models in pharmacokinetics (=-=Bennett and Wakefield, 1994-=-); related analyses are also discussed by Draper (1994) and Higdon (1994). The focus of this paper is on model mixing for prediction. In this context, there are opportunities for constructing models a... |

2 |
Orthogonalizations and Priors for Orthogonalized Model Mixing
- Clyde, Parmigiani
- 1995
(Show Context)
Citation Context ...the mean response or by the resulting amount of shrinkage, as discussed further in section 2.3. In particular, taking ` i 's less than .5 enforces a penalty for each additional term in the model (see =-=Clyde and Parmigiani, 1995-=-). Our prior specification identifies prior distributions for the coefficients ff and fi given any fl. Importantly, even if, as we suggest, one first assigns the prior distribution on fi given fl = 1,... |

2 | Fast Bayes Variable Selection. Graduate - George, McCulloch - 1994 |