Results 1 
7 of
7
Training Samples in Objective Bayesian Model Selection
 Ann. Statist
, 2004
"... Central to several objective approaches to Bayesian model selection is the use of training samples (subsets of the data), so as to allow utilization of improper objective priors. The most common prescription for choosing training samples is to choose them to be as small as possible, subject to yield ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Central to several objective approaches to Bayesian model selection is the use of training samples (subsets of the data), so as to allow utilization of improper objective priors. The most common prescription for choosing training samples is to choose them to be as small as possible, subject to yielding proper posteriors; these are called minimal training samples.
Compatibility of prior specifications across linear models
 Statistical Science
, 2008
"... Abstract. Bayesian model comparison requires the specification of a prior distribution on the parameter space of each candidate model. In this connection two concerns arise: on the one hand the elicitation task rapidly becomes prohibitive as the number of models increases; on the other hand numerous ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Bayesian model comparison requires the specification of a prior distribution on the parameter space of each candidate model. In this connection two concerns arise: on the one hand the elicitation task rapidly becomes prohibitive as the number of models increases; on the other hand numerous prior specifications can only exacerbate the wellknown sensitivity to prior assignments, thus producing less dependable conclusions. Within the subjective framework, both difficulties can be counteracted by linking priors across models in order to achieve simplification and compatibility; we discuss links with related objective approaches. Given an encompassing, or full, model together with a prior on its parameter space, we review and summarize a few procedures for deriving priors under a submodel, namely marginalization, conditioning, and Kullback–Leibler projection. These techniques are illustrated and discussed with reference to variable selection in linear models adopting a conventional gprior; comparisons with existing standard approaches are provided. Finally, the relative merits of each procedure are evaluated through simulated and real data sets. Key words and phrases: Bayes factor, compatible prior, conjugate prior, gprior, hypothesis testing, Kullback–Leibler projection, nested model, variable selection.
Discussion on the paper: Catching up faster by switching sooner..., by Erven
"... I’d like to thank the authors for their catchup description. In particular, the example illustrated in Fig. 1 has useful tutorial value, and I regular refer people to it. The models used in the example are deliberately crude, to construct a clear example. Nevertheless, I think it is worth explicitl ..."
Abstract
 Add to MetaCart
I’d like to thank the authors for their catchup description. In particular, the example illustrated in Fig. 1 has useful tutorial value, and I regular refer people to it. The models used in the example are deliberately crude, to construct a clear example. Nevertheless, I think it is worth explicitly reviewing why the more powerful model suffers from the catchup phenomenon, and how it might be avoided through hierarchical modelling. The 2nd order Markov model in the example (a ‘trigram model’) performs worse than the 1st order model for small datasets. This result still holds for text actually generated from a 2nd order Markov model, when the trigram statistics are matched to English characters. The subjective Bayesian demands an explanation: we should use the model we believe, regardless of how much data we have. The trigram model does poorly whenever the two characters providing context have rarely been seen before: its uniform prior does not allow generalization from past experience with other contexts. In realworld language modelling applications, prediction are ‘smoothed ’ with statistics from shorter contexts (Chen and Goodman, 1998). I ran a ‘Witten–Bell ’ smoothed trigram model on the Alice text: it outperformed both the other Markov models across the
Incorporating External Evidence in Reinforcement Learning via Power Prior Bayesian Analysis
"... Power priors allow us to introduce into a Bayesian algorithm a relative precision parameter that controls the influence of external evidence on a new task. Such evidence, often available as historical data, can be quite useful when learning a new task from reinforcement. In this paper, we study the ..."
Abstract
 Add to MetaCart
Power priors allow us to introduce into a Bayesian algorithm a relative precision parameter that controls the influence of external evidence on a new task. Such evidence, often available as historical data, can be quite useful when learning a new task from reinforcement. In this paper, we study the use of power priors in Bayesian reinforcement learning. We start by describing the basics of power prior distributions. We then develop power priors for unknown Markov decision processes incorporating historical data. Finally, we apply the power priors approach to learning an intervention timing task. 1
The Whetstone and the Alum Block: Balanced Objective Bayesian Comparison of Nested Models for Discrete Data ∗
"... Abstract. When two nested models are compared, using a Bayes factor, from an objective standpoint, two seemingly conflicting issues emerge at the time of choosing parameter priors under the two models. On the one hand, for moderate sample sizes, the evidence in favor of the smaller model can be infl ..."
Abstract
 Add to MetaCart
Abstract. When two nested models are compared, using a Bayes factor, from an objective standpoint, two seemingly conflicting issues emerge at the time of choosing parameter priors under the two models. On the one hand, for moderate sample sizes, the evidence in favor of the smaller model can be inflated by diffuseness of the prior under the larger model. On the other hand, asymptotically, the evidence in favor of the smaller model typically accumulates at a slower rate. With reference to finitely discrete data models, we show that these two issues can be dealt with jointly, by combining intrinsic priors and nonlocal priors in a new unified class of priors. We illustrate our ideas in a running Bernoulli example, then we apply them to test the equality of two proportions, and finally we deal with the more general case of logistic regression models.
INTEGRATIVE ANALYSIS OF DATA, LITERATURE, AND EXPERT KNOWLEDGE BY BAYESIAN NETWORKS
"... 1. The ovarian cancer problem, the IOTA project ..."
(Show Context)