Results 21 - 30
of
82
Statistical Ideas for Selecting Network Architectures
- Invited Presentation, Neural Information Processing Systems 8
, 1995
"... Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of output units should be chosen? And so on. We address these issues within the framework of statistical theory for model choice, which provides a number of workable approximate answers. This paper is principally concerned with architecture selection issues for feed-forward neural networks (also known as multi-layer perceptrons). Many of the same issues arise in selecting radial basis function networks, recurrent networks and more widely. These problems occur in a much wider context within statistics, and applied statisticians have been selecting and combining models for decades. Two recent discussions are [4, 5]. References [3, 20, 21, 22] discuss neural networks from a statistical perspecti...
Linearly combining density estimators via stacking
- Machine Learning
, 1999
"... This paper presents experimental results with both real and artificial data on using the technique of stacking to combine unsupervised learning algorithms. Specifically, stacking is used to form a linear combination of finite mixture model and kernel density estimators for non-parametric multivariat ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
This paper presents experimental results with both real and artificial data on using the technique of stacking to combine unsupervised learning algorithms. Specifically, stacking is used to form a linear combination of finite mixture model and kernel density estimators for non-parametric multivariate density estimation. The method is found to outperform other strategies such as choosing the single best model based on cross-validation, combining with uniform weights, and even using the single best model chosen by “cheating ” and examining the test set. We also investigate in detail how the utility of stacking changes when one of the models being combined generated the data; how the stacking coefficients of the models compare to the relative frequencies with which cross-validation chooses among the models; visualization of combined “effective ” kernels; and the sensitivity of stacking to overfitting as model complexity increases. In an extended version of this paper we also investigate how stacking performs using L1 and L2 performance measures (for which one must know the true density) rather than log-likelihood (Smyth and Wolpert 1998). 1
Implementation and Performance Issues in the Bayesian And Likelihood . . .
- COMPUTATIONAL STATISTICS
, 2000
"... ..."
Model averaging and value-at-risk based evaluation of large multi-asset volatility models for risk management
, 2005
"... This paper considers the problem of model uncertainty in the case of multi-asset volatility models and discusses the use of model averaging techniques as a way of dealing with the risk of inadvertently using false models in portfolio management. Evaluation of volatility models is then considered and ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper considers the problem of model uncertainty in the case of multi-asset volatility models and discusses the use of model averaging techniques as a way of dealing with the risk of inadvertently using false models in portfolio management. Evaluation of volatility models is then considered and a simple Value-at-Risk (VaR) diagnostic test is proposed for individual as well as ‘average’ models. The asymptotic as well as the exact finite-sample distribution of the test statistic, dealing with the possibility of parameter uncertainty, are established. The model averaging idea and the VaR diagnostic tests are illustrated by an application to portfolios of daily returns based on twenty two of Standard & Poor’s 500 industry group indices over the period 1995-2003. We find strong evidence in support of ‘thick’ modelling proposed in the forecasting literature by Granger and Jeon (2004).
A Discussion of Parameter and Model Uncertainty in Insurance
- in Insurance,” Insurance: Mathematics and Economics
, 2000
"... In this paper we consider the process of modelling uncertainty. In particular we are concerned with making inferences about some quantity of interest which, at present, has been unobserved. Examples of such a quantity include the probability of ruin of a surplus process, the accumulation of an inves ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
In this paper we consider the process of modelling uncertainty. In particular we are concerned with making inferences about some quantity of interest which, at present, has been unobserved. Examples of such a quantity include the probability of ruin of a surplus process, the accumulation of an investment, the level or surplus or deficit in a pension fund and the future volume of new business in an insurance company. Uncertainty in this quantity of interest, y, arises from three sources: . uncertainty due to the stochastic nature of a given model; . uncertainty in the values of the parameters in a given model; . uncertainty in the model underlying what we are able to observe and determining the quantity of interest. It is common in actuarial science to find that the first source of uncertainty is the only one which receives rigorous attention. A limited amount of research in recent years has considered the effect of parameter uncertainty, while there is still considerable scope ...
A Bayesian formulation of exploratory data analysis and goodness-of-fit testing
, 2003
"... Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)|which are generally considered as unrelated statistical paradigms|can be particularly eective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predict ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)|which are generally considered as unrelated statistical paradigms|can be particularly eective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data y and replicated parameters follows a long tradition of generalizations in Bayesian theory.
A comparison of Bayesian and likelihood-based methods for fitting multilevel models
"... this paper on the likelihood-based (and approximate likelihood) methods most readily available (given current usage patterns of existing software) to statisticians and substantive researchers making frequent use of multilevel models: ML and REML in VC models, and MQL and PQL in RELR models. Other pr ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
this paper on the likelihood-based (and approximate likelihood) methods most readily available (given current usage patterns of existing software) to statisticians and substantive researchers making frequent use of multilevel models: ML and REML in VC models, and MQL and PQL in RELR models. Other promising likelihood-based approaches-- including (a) methods based on Gaussian quadrature (e.g., Pinheiro and Bates 1995); (b) the nonparametric maximum likelihood methods of Airkin (1999a); (c) the Laplace-approximation approach of Raudenbush et al. (1999); (d) the work on hierarchical generalised linear models of Lee and Nelder (2000); and (e) interval estimation based on ranges of values of the param- eters for which the log likelihood is within a certain distance of its maximum, for instance using profile likelihood (e.g., Longford 2000)--are not addressed here. It is evident from the recent applied literature that, from the point of view of multilevel analyses currently being conducted to inform educational and health policy choices and other substantive decisions, the use of methods (a-e) is not (yet) as widespread as REML and quasi-likelihood approaches. Statisticians are well aware that the highly skewed repeated-sampling distributions of ML estimators of random-effects variances in multilevel models with small sample sizes are not likely to lead to good coverage properties for large-sample Gaussian approximate interval estimates of the form r 2-F 1.96 (2), but with few exceptions the profession has not (yet) responded to this by making software for improved likelihood interval estimates widely available to multilevel modellers. In Sections 3 and 4 we document the extent of the poor coverage behaviour of the Gaussian approach, and we offer several simple approximation ...
Enhancing the Predictive Performance of Bayesian Graphical Models
- Communications in Statistics – Theory and Methods
, 1995
"... Both knowledge-based systems and statistical models are typically concerned with making predictions about future observables. Here we focus on assessment of predictive performance and provide two techniques for improving the predictive performance of Bayesian graphical models. First, we present Baye ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Both knowledge-based systems and statistical models are typically concerned with making predictions about future observables. Here we focus on assessment of predictive performance and provide two techniques for improving the predictive performance of Bayesian graphical models. First, we present Bayesian model averaging, a technique for accounting for model uncertainty. Second, we describe a technique for eliciting a prior distribution for competing models from domain experts. We explore the predictive performance of both techniques in the context of a urological diagnostic problem. KEYWORDS: Prediction; Bayesian graphical model; Bayesian network; Decomposable model; Model uncertainty; Elicitation. 1 Introduction Both statistical methods and knowledge-based systems are typically concerned with combining information from various sources to make inferences about prospective measurements. Inevitably, to combine information, we must make modeling assumptions. It follows that we should car...

