## Optimal Predictive Model Selection (2002)

Venue: | Ann. Statist |

Citations: | 54 - 2 self |

### BibTeX

@ARTICLE{Barbieri02optimalpredictive,

author = {Maria Maddalena Barbieri and James O. Berger},

title = {Optimal Predictive Model Selection},

journal = {Ann. Statist},

year = {2002},

volume = {32},

pages = {870--897}

}

### Years of Citing Articles

### OpenURL

### Abstract

Often the goal of model selection is to choose a model for future prediction, and it is natural to measure the accuracy of a future prediction by squared error loss.

### Citations

1135 |
Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ...satisfy (5) and the posterior distribution is as in (11), is the model that minimizes R(Mi) -- (Hll - )' Q (Hll - ) ' (16) wheresis defined in (15). Proof. For fixed x*, a standard result (see, e.g., =-=Bernardo and Smith, 1994-=-, p. 398) is that E[( - y*)] = C + ( - 0') , where C does not depend on I and the expectation is with respect to the predictive distribution of y* given y. Since taking the expectation over x* and usi... |

700 |
Applied regression analysis
- DRAPER, SMITH
- 1966
(Show Context)
Citation Context ...ving to a larger model will reduce the risk until ( 1 − 2 ∑k i=j+1 pl(i) positive. The conclusion is immediate. are increasing in j from −1 ) first turns ✷ Example 3. Consider Hald’s regression data (=-=Draper and Smith, 1981-=-), consisting of n = 13 observations on a dependent variable y, with four potential regressors: x1, x2, x3, x4. Suppose that the following nested models, all including a constant term c, are under con... |

559 |
Theory of Probability
- Jeffreys
- 1961
(Show Context)
Citation Context ...s given by (24). Two choices of model prior probabilities are considered, P (M l(i)) = 1/4, i = 1, 2, 3, 4, and P ∗ (M l(i)) = i −1 / ∑ 4 j=1 j −1 (the latter type of choice being discussed in, e.g., =-=Jeffreys, 1961-=-). Default posterior probabilities of each model are then obtained using the Encompassing Arithmetic Intrinsic Bayes Factor, recommended in Berger and Pericchi (1996a, 1996b) for linear models. The re... |

361 | Variable selection via gibbs sampling - George, McCulloch - 1993 |

253 |
The Analysis of Variance
- Scheffé
- 1959
(Show Context)
Citation Context ...1000, M1100, M1010, M1110, M1111}. Note that this set of models has graphical structure. Scenario 3 - An analogue of an unusual classical test: In classical ANOVA testing, it is sometimes argued (cf. =-=Scheffé, 1959-=-, pp. 94 and 110) that one might be interested in testing for no interaction effect followed by testing for the main effects, even if the no-interaction test rejected. (It is argued that the hypothese... |

163 | Bayesian model averaging: A tutorial - Raftery, Volinsky - 1999 |

148 |
The intrinsic Bayes factor for model selection and prediction
- Berger, Pericchi
- 1996
(Show Context)
Citation Context ... uncommon to use separate methodologies to arrive at the Pl and the 7r l(fi//, cr ] y), the Pl being determined through use of a default model selection tool such as BIC, Intrinsic Bayes Factors (cf. =-=Berger and Pericchi, 1996-=-a), or Fractional Bayes Factors (cf. O'Hagan, 1995); and the 7rl(fill, cr]y) being determined from ordinary noninformative priors, typically the reference priors, which are either constant in the know... |

125 | Calibration and Empirical BAyes variable selection - George, Foster - 2000 |

124 | Multiple shrinkage and subset selection in wavelets - Clyde, Parmigiani, et al. - 1998 |

113 |
Fractional Bayes factors for model comparison (with discussion
- O’Hagan
- 1995
(Show Context)
Citation Context ...d the 7r l(fi//, cr ] y), the Pl being determined through use of a default model selection tool such as BIC, Intrinsic Bayes Factors (cf. Berger and Pericchi, 1996a), or Fractional Bayes Factors (cf. =-=O'Hagan, 1995-=-); and the 7rl(fill, cr]y) being determined from ordinary noninformative priors, typically the reference priors, which are either constant in the known variance case or given by 1 7rl(fi/l, a ) = - (1... |

107 | Bayesian variable selection in linear regression - Mitchell, Beauchamp - 1988 |

90 | The Practical Implementation of Bayesian Model Selection - Chipman, George, et al. - 2001 |

73 | Flexible empirical Bayes estimation for wavelets
- Clyde, George
- 2000
(Show Context)
Citation Context ...al conditions if only two models are being entertained (see Berger, 1997) and is often true in the variable selection problem for linear models having orthogonal design matrices (cf. Clyde, 1999, and =-=Clyde and George, 2000-=-), but is not generally true. Indeed, even when only three models are being entertained, essentially nothing can be said about which model is best if one knows only the posterior probabilities of the ... |

58 | Bayesian model averaging: A tutorial, Statistical Science 14(4): 382–417 - Hoeting, Madigan, et al. - 1999 |

54 | Prediction via orthogonalized model mixing - Clyde, Desimone, et al. - 1996 |

49 |
Bayesian model averaging and model search strategies [with discussion
- Clyde
- 1999
(Show Context)
Citation Context ... under very general conditions if only two models are being entertained (see Berger, 1997) and is often true in the variable selection problem for linear models having orthogonal design matrices (cf. =-=Clyde, 1999-=-, and Clyde and George, 2000), but is not generally true. Indeed, even when only three models are being entertained, essentially nothing can be said about which model is best if one knows only the pos... |

42 | Objective Bayesian methods for model selection: introduction and comparison (with discussion - Berger, Pericchi - 2001 |

34 | Empirical Bayes estimation in wavelet nonparametric regression. In
- Clyde, George
- 1999
(Show Context)
Citation Context ...al conditions if only two models are being entertained (see Berger, 1997) and is often true in the variable selection problem for linear models having orthogonal design matrices (cf. Clyde, 1999, and =-=Clyde and George, 1999-=-, 2000), but is not generally true. Indeed, even when only three models are being entertained, essentially nothing can be said about which model is best if one knows only the posterior probabilities o... |

33 | Design and Analysis of Experiments. 3rd Ed - MONTGOMERY - 1991 |

32 |
Bayes factors and marginal distributions in invariant situations,” Sankhya: The Indian
- Berger, Pericchi, et al.
- 1998
(Show Context)
Citation Context ...eration have 'common' unknown parameters; it is then typical to utilize noninformative priors for the common parameters, while using independent conjugate normal priors for the other parameters. (See =-=Berger, Pericchi, and Varshavsky, 1998-=-, for justification of this practice.) Lemma 2 If Q is diagonal with diagonal elements qis0 and (17) holds, then k ---- /iqi(li--Pi) 2 , (18) i=1 where pi is as in (6). Proof. From (17) it follows tha... |

27 |
Approximations and Consistency of Bayes Factors as Model Dimension Grows
- Berger, Ghosh, et al.
- 2003
(Show Context)
Citation Context ...ve model in the situation of Shibata (1983) is actually the median probability model. (There are also concerns with the applicability of BIC as an approximation to log posterior probability here; see =-=Berger, Ghosh, and Mukhopadhyay, 1999-=-, for further discussion.) 2 Preliminaries 2.1 Posterior inputs to the prediction problem Information from the data and prior is summarized by providing, for all l, Pl -- P(MI lY), the posterior proba... |

24 |
The intrinsic Bayes factor for linear models
- Berger, Perrichi
- 1995
(Show Context)
Citation Context ... uncommon to use separate methodologies to arrive at the Pl and the 7r l(fi//, cr ] y), the Pl being determined through use of a default model selection tool such as BIC, Intrinsic Bayes Factors (cf. =-=Berger and Pericchi, 1996-=-a), or Fractional Bayes Factors (cf. O'Hagan, 1995); and the 7rl(fill, cr]y) being determined from ordinary noninformative priors, typically the reference priors, which are either constant in the know... |

21 |
Simulation Based Optimal Design
- Muller
- 1998
(Show Context)
Citation Context ...chemes have been developed that can effectively determine the posterior model probabilities, P (M l | y), but adding an expectation over x ∗ and a minimization over l can be prohibitive (although see =-=Müller, 1999-=-). We thus sought to determine if there are situations in which it is possible to give the optimal predictive model solely in terms of the posterior model probabilities. Rather general characterizatio... |

18 | Asymptotic mean efficiency of a selection of regression variables - Shibata - 1983 |

11 |
The Analysis of Variance
- Scheff
- 1959
(Show Context)
Citation Context ...32 32 Alow, Bhigh 18 19 23 Ahigh, Bhigh 31 30 29 Table 2: Data for the 22 ANOVA example. Scenario 3 - An analogue of an unusual classical test: In classical ANOVA testing, it is sometimes argued (cf. =-=Scheff, 1959-=-, pp. 94 and 110) that one might be interested in testing for no interaction effect followed by testing for the main effects, even if the no-interaction test rejected. (It is argued that the hypothese... |

7 | Orthogonalizations and prior distributions for orthogonalized model mixing
- CLYDE, PARMIGIANI
- 1996
(Show Context)
Citation Context ... are the posterior inclusion probabilities in (6). This will be seen to occur in the problem of variable selection under an orthogonal design matrix, certain prior structures, and known variance a2. (=-=Clyde and Parmigiani, 1996-=-, and Clyde, DeSireone and Parmigiani, 1996, show that (10) can often be approximately satisfied when a2 is unknown, and it is likely that the median probability model will equal the maximum probabili... |

5 |
Bayes factors
- BERGER
- 1997
(Show Context)
Citation Context ...l selection, it is commonly perceived that the best model will be that with the highest posterior probability. This is true under very general conditions if only two models are being entertained (see =-=Berger, 1997-=-) and is often true in the variable selection problem for linear models having orthogonal design matrices (cf. Clyde, 1999, and Clyde and George, 2000), but is not generally true. Indeed, even when on... |

4 | El análisis de varianza basado en los factores de Bayes intrínsecos - NADAL - 1999 |

2 |
Applied Re#ression Analysis
- Draper, Smith
- 1981
(Show Context)
Citation Context ...j+l increasing in from to +1, moving to a larger model will reduce the risk until (1 - k Zi=j+lPI(j)) first turns positive. The conclusion is immediate. [] Example 3. Consider Hald's regression data (=-=Draper and Smith, 1981-=-), consisting of n = 13 observations on a dependent variable y, with four potential regressors: Xl, x2, x3, x4. Suppose that the following nested models, all including a constant term c, are under con... |

2 | Bayesian and Empirical Bayesian Model Selection - Mukhopadhyhay - 2000 |

2 | Discussion of “A case study in model selection,” by - BERGER, M - 2002 |

1 |
Simulation based optimal design
- Miiller
- 1999
(Show Context)
Citation Context ...CMC schemes have been developed that can effectively determine the posterior model probabilities, P(Mlly), but adding an expectation over x* and a minimization over I can be prohibitive (although see =-=Miiller, 1999-=-). We thus sought to 2 determine if there are situations in which it is possible to give the optimal predictive model solely in terms of the posterior model probabilities. Rather general characterizat... |

1 | El Andlists de Varianza Basado en los Factores de Bayes Intrnsecos - Nadal - 1999 |

1 | The intrinsic Bayes factor for model selection and prediction - Viele, Tarr, et al. - 1996 |