Results 21  30
of
95
A comparison of Bayesian and likelihoodbased methods for fitting multilevel models
"... this paper on the likelihoodbased (and approximate likelihood) methods most readily available (given current usage patterns of existing software) to statisticians and substantive researchers making frequent use of multilevel models: ML and REML in VC models, and MQL and PQL in RELR models. Other pr ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
this paper on the likelihoodbased (and approximate likelihood) methods most readily available (given current usage patterns of existing software) to statisticians and substantive researchers making frequent use of multilevel models: ML and REML in VC models, and MQL and PQL in RELR models. Other promising likelihoodbased approaches including (a) methods based on Gaussian quadrature (e.g., Pinheiro and Bates 1995); (b) the nonparametric maximum likelihood methods of Airkin (1999a); (c) the Laplaceapproximation approach of Raudenbush et al. (1999); (d) the work on hierarchical generalised linear models of Lee and Nelder (2000); and (e) interval estimation based on ranges of values of the param eters for which the log likelihood is within a certain distance of its maximum, for instance using profile likelihood (e.g., Longford 2000)are not addressed here. It is evident from the recent applied literature that, from the point of view of multilevel analyses currently being conducted to inform educational and health policy choices and other substantive decisions, the use of methods (ae) is not (yet) as widespread as REML and quasilikelihood approaches. Statisticians are well aware that the highly skewed repeatedsampling distributions of ML estimators of randomeffects variances in multilevel models with small sample sizes are not likely to lead to good coverage properties for largesample Gaussian approximate interval estimates of the form r 2F 1.96 (2), but with few exceptions the profession has not (yet) responded to this by making software for improved likelihood interval estimates widely available to multilevel modellers. In Sections 3 and 4 we document the extent of the poor coverage behaviour of the Gaussian approach, and we offer several simple approximation ...
Variable selection and Bayesian model averaging in casecontrol studies
, 1998
"... Covariate and confounder selection in casecontrol studies is most commonly carried out using either a twostep method or a stepwise variable selection method in logistic regression. Inference is then carried out conditionally on the selected model, but this ignores the model uncertainty implicit in ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
Covariate and confounder selection in casecontrol studies is most commonly carried out using either a twostep method or a stepwise variable selection method in logistic regression. Inference is then carried out conditionally on the selected model, but this ignores the model uncertainty implicit in the variable selection process, and so underestimates uncertainty about relative risks. We report on a simulation study designed to be similar to actual casecontrol studies. This shows that pvalues computed after variable selection can greatly overstate the strength of conclusions. For example, for our simulated casecontrol studies with 1,000 subjects, of variables declared to be "significant" with pvalues between.01 and.05, only 49 % actually were risk factors when stepwise variable selection was used. We propose Bayesian model averaging as a formal way of taking account of model uncertainty in casecontrol studies. This yields an easily interpreted summary, the posterior probability that a variable is a risk factor, and our simulation study indicates this to be reasonably well calibrated in the situations simulated. The methods are applied and compared
Bayesian Wavelet Networks for Nonparametric Regression
, 1997
"... Radial wavelet networks have recently been proposed as a method for nonparametric regression. In this paper we analyse their performance within a Bayesian framework. We derive probability distributions over both the dimension of the networks and the network coefficients by placing a prior on the deg ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Radial wavelet networks have recently been proposed as a method for nonparametric regression. In this paper we analyse their performance within a Bayesian framework. We derive probability distributions over both the dimension of the networks and the network coefficients by placing a prior on the degrees of freedom of the model. This process bypasses the need to test or select a finite number of networks during the modelling process. Predictions are formed by mixing over many models of varying dimension and parameterization. We show that the complexity of the models adapts to the complexity of the data and produces good results on a number of benchmark test series. Keywords: Wavelets, radial basis functions, model choice, Bayesian neural networks, reversible jump Markov chain Monte Carlo, nonparametric regression, splines. 1 Introduction Wavelet networks have previously been studied in relation to nonparametric regression by Zhang (1997), Kugarajah and Zhang (1995), Zhang and Benveni...
Statistical Ideas for Selecting Network Architectures
 Invited Presentation, Neural Information Processing Systems 8
, 1995
"... Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of output units should be chosen? And so on. We address these issues within the framework of statistical theory for model choice, which provides a number of workable approximate answers. This paper is principally concerned with architecture selection issues for feedforward neural networks (also known as multilayer perceptrons). Many of the same issues arise in selecting radial basis function networks, recurrent networks and more widely. These problems occur in a much wider context within statistics, and applied statisticians have been selecting and combining models for decades. Two recent discussions are [4, 5]. References [3, 20, 21, 22] discuss neural networks from a statistical perspecti...
Implementation and Performance Issues in the Bayesian And Likelihood . . .
 COMPUTATIONAL STATISTICS
, 2000
"... ..."
A Bayesian formulation of exploratory data analysis and goodnessoffit testing
, 2003
"... Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)which are generally considered as unrelated statistical paradigmscan be particularly eective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predict ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)which are generally considered as unrelated statistical paradigmscan be particularly eective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data y and replicated parameters follows a long tradition of generalizations in Bayesian theory.
Linearly combining density estimators via stacking
 Machine Learning
, 1999
"... This paper presents experimental results with both real and artificial data on using the technique of stacking to combine unsupervised learning algorithms. Specifically, stacking is used to form a linear combination of finite mixture model and kernel density estimators for nonparametric multivariat ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
This paper presents experimental results with both real and artificial data on using the technique of stacking to combine unsupervised learning algorithms. Specifically, stacking is used to form a linear combination of finite mixture model and kernel density estimators for nonparametric multivariate density estimation. The method is found to outperform other strategies such as choosing the single best model based on crossvalidation, combining with uniform weights, and even using the single best model chosen by “cheating ” and examining the test set. We also investigate in detail how the utility of stacking changes when one of the models being combined generated the data; how the stacking coefficients of the models compare to the relative frequencies with which crossvalidation chooses among the models; visualization of combined “effective ” kernels; and the sensitivity of stacking to overfitting as model complexity increases. In an extended version of this paper we also investigate how stacking performs using L1 and L2 performance measures (for which one must know the true density) rather than loglikelihood (Smyth and Wolpert 1998). 1
Model averaging and valueatrisk based evaluation of large multiasset volatility models for risk management
, 2005
"... This paper considers the problem of model uncertainty in the case of multiasset volatility models and discusses the use of model averaging techniques as a way of dealing with the risk of inadvertently using false models in portfolio management. Evaluation of volatility models is then considered and ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
This paper considers the problem of model uncertainty in the case of multiasset volatility models and discusses the use of model averaging techniques as a way of dealing with the risk of inadvertently using false models in portfolio management. Evaluation of volatility models is then considered and a simple ValueatRisk (VaR) diagnostic test is proposed for individual as well as ‘average’ models. The asymptotic as well as the exact finitesample distribution of the test statistic, dealing with the possibility of parameter uncertainty, are established. The model averaging idea and the VaR diagnostic tests are illustrated by an application to portfolios of daily returns based on twenty two of Standard & Poor’s 500 industry group indices over the period 19952003. We find strong evidence in support of ‘thick’ modelling proposed in the forecasting literature by Granger and Jeon (2004).
Population Markov Chain Monte Carlo
 Machine Learning
, 2003
"... Stochastic search algorithms inspired by physical and biological systems are applied to the problem of learning directed graphical probability models in the presence of missing observations and hidden variables. For this class of problems, deterministic search algorithms tend to halt at local optima ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Stochastic search algorithms inspired by physical and biological systems are applied to the problem of learning directed graphical probability models in the presence of missing observations and hidden variables. For this class of problems, deterministic search algorithms tend to halt at local optima, requiring random restarts to obtain solutions of acceptable quality. We compare three stochastic search algorithms: a MetropolisHastings Sampler (MHS), an Evolutionary Algorithm (EA), and a new hybrid algorithm called Population Markov Chain Monte Carlo, or popMCMC. PopMCMC uses statistical information from a population of MHSs to inform the proposal distributions for individual samplers in the population. Experimental results show that popMCMC and EAs learn more efficiently than the MHS with no information exchange. Populations of MCMC samplers exhibit more diversity than populations evolving according to EAs not satisfying physicsinspired local reversibility conditions. KEY WORDS: Markov Chain Monte Carlo, MetropolisHastings Algorithm, Graphical Probabilistic Models, Bayesian Networks, Bayesian Learning, Evolutionary Algorithms Machine Learning MCMC Issue 1 5/16/01 1.