Results 1  10
of
122
Bayesian Experimental Design: A Review
 Statistical Science
, 1995
"... This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models. A unified view of the topic is presented by putting experimental design in a decision theoretic framework. This framework justifies many optimality criteria, and opens new possibilities. Various ..."
Abstract

Cited by 310 (1 self)
 Add to MetaCart
This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models. A unified view of the topic is presented by putting experimental design in a decision theoretic framework. This framework justifies many optimality criteria, and opens new possibilities. Various design criteria become part of a single, coherent approach.
Approaches for Bayesian variable selection
 Statistica Sinica
, 1997
"... Abstract: This paper describes and compares various hierarchical mixture prior formulations of variable selection uncertainty in normal linear regression models. These include the nonconjugate SSVS formulation of George and McCulloch (1993), as well as conjugate formulations which allow for analytic ..."
Abstract

Cited by 227 (6 self)
 Add to MetaCart
Abstract: This paper describes and compares various hierarchical mixture prior formulations of variable selection uncertainty in normal linear regression models. These include the nonconjugate SSVS formulation of George and McCulloch (1993), as well as conjugate formulations which allow for analytical simplification. Hyperparameter settings which base selection on practical significance, and the implications of using mixtures with point priors are discussed. Computational methods for posterior evaluation and exploration are considered. Rapid updating methods are seen to provide feasible methods for exhaustive evaluation using Gray Code sequencing in moderately sized problems, and fast Markov Chain Monte Carlo exploration in large problems. Estimation of normalization constants is seen to provide improved posterior estimates of individual model probabilities and the total visited probability. Various procedures are illustrated on simulated sample problems and on a real problem concerning the construction of financial index tracking portfolios.
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract

Cited by 152 (4 self)
 Add to MetaCart
(Show Context)
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quartermillion dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
The practical implementation of Bayesian model selection
 Institute of Mathematical Statistics
, 2001
"... In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is r ..."
Abstract

Cited by 128 (3 self)
 Add to MetaCart
In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is relevant for model selection. However, the practical implementation of this approach often requires carefully tailored priors and novel posterior calculation methods. In this article, we illustrate some of the fundamental practical issues that arise for two different model selection problems: the variable selection problem for the linear model and the CART model selection problem.
How Many Genes Are Needed for a Discriminant Microarray Data Analysis
 Proc. Critical Assessment of Techniques for Microarray Data Mining Workshop
, 2000
"... The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are measured, not all are needed for discriminant analysis: a gene may either not contribute to the separation of two types of t ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are measured, not all are needed for discriminant analysis: a gene may either not contribute to the separation of two types of tissues/cancers, or it may be redundant because it is highly correlated with other genes. There are two theoretical frameworks in which variable selection (or gene selection in our case) can be addressed. The first is model selection, and the second is model averaging. We have carried out model selection using Akaike information criterion and Bayesian information criterion with logistic regression (discrimination, prediction, or classification) to determine the number of genes that provide the best model. These model selection criteria set upper limits of 2225 and 1213 genes for this data set with 38 samples, and the best model consists of only one (no.4847, zyxin) or two genes. We have also carried out model averaging over the best singlegene logistic predictors using three different weights: maximized likelihood, prediction rate on training set, and equal weight. We have observed that the performance of most of these weighted predictors on the testing set is gradually reduced as more genes are included, but a clear cutoff that separates good and bad prediction performance is not found. 1 Li Yang 2
Inference and Hierarchical Modeling in the Social Sciences
, 1995
"... this paper I (1) examine three levels of inferential strength supported by typical social science datagathering methods, and call for a greater degree of explicitness, when HMs and other models are applied, in identifying which level is appropriate; (2) reconsider the use of HMs in school effective ..."
Abstract

Cited by 44 (6 self)
 Add to MetaCart
this paper I (1) examine three levels of inferential strength supported by typical social science datagathering methods, and call for a greater degree of explicitness, when HMs and other models are applied, in identifying which level is appropriate; (2) reconsider the use of HMs in school effectiveness studies and metaanalysis from the perspective of causal inference; and (3) recommend the increased use of Gibbs sampling and other Markovchain Monte Carlo (MCMC) methods in the application of HMs in the social sciences, so that comparisons between MCMC and betterestablished fitting methodsincluding full or restricted maximum likelihood estimation based on the EM algorithm, Fisher scoring or iterative generalized least squaresmay be more fully informed by empirical practice.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
 Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
(Show Context)
Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximumlikelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for highperformance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
Robust speech recognition based on Bayesian prediction approach”, submitted to
 IEEE Trans. on Speech and Audio Process ing
, 1997
"... recognition problem in which mismatches exist between training and testing conditions, and no accurate knowledge of the mismatch mechanism is available. The only available information is the test data along with a set of pretrained Gaussian mixture continuous density hidden Markov models (CDHMM’s). ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
(Show Context)
recognition problem in which mismatches exist between training and testing conditions, and no accurate knowledge of the mismatch mechanism is available. The only available information is the test data along with a set of pretrained Gaussian mixture continuous density hidden Markov models (CDHMM’s). We investigate the problem from the viewpoint of Bayesian prediction. A simple prior distribution, namely constrained uniform distribution, is adopted to characterize the uncertainty of the mean vectors of the CDHMM’s. Two methods, namely a model compensation technique based on Bayesian predictive density and a robust decision strategy called Viterbi Bayesian predictive classification are studied. The proposed methods are compared with the conventional Viterbi decoding algorithm in speakerindependent recognition experiments on isolated digits and TI connected digit strings (TIDIGITS), where the mismatches between training and testing conditions are caused by: 1) additive Gaussian white noise, 2) each of 25 types of actual additive ambient noises, and 3) gender difference. The experimental results show that the adopted prior distribution and the proposed techniques help to improve the performance robustness under the examined mismatch conditions. Index Terms—Bayesian predictive classification, minimax decision, plugin maximum a posteriori decision, predictive density, Viterbi Bayesian predictive classification. I.
Zipf's Law in Importance of Genes for Cancer Classification Using Microarray Data
, 2002
"... Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the fallingoff of this measure (normalized maximum likelihood in a classification model such as logistic regression) ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the fallingoff of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a powerlaw function. This powerlaw function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this powerlaw function prevents an intrinsic cutoff point between the ``important" genes and ``irrelevant" genes. We have shown that similar powerlaw functions are also present in permuted dataset, and provide an explanation from the wellknown $\chi^2$ distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of falloff of the likelihood.