Results 1  10
of
21
A hierarchical dirichlet language model
 Natural Language Engineering
, 1994
"... We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as 'smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new ..."
Abstract

Cited by 89 (3 self)
 Add to MetaCart
(Show Context)
We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as 'smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new directions for language modelling. The ideas of this paper are also applicable to other problems such as the modelling of triphomes in speech, and DNA and protein sequences in molecular biology. The new algorithm is compared with smoothing on a two million word corpus. The methods prove to be about equally accurate, with the hierarchical model using fewer computational resources. 1
Ensemble Learning and Evidence Maximization
 Proc. NIPS
, 1995
"... Ensemble learning by variational free energy minimization is a tool introduced to neural networks by Hinton and van Camp in which learning is described in terms of the optimization of an ensemble of parameter vectors. The optimized ensemble is an approximation to the posterior probability distributi ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
Ensemble learning by variational free energy minimization is a tool introduced to neural networks by Hinton and van Camp in which learning is described in terms of the optimization of an ensemble of parameter vectors. The optimized ensemble is an approximation to the posterior probability distribution of the parameters. This tool has now been applied to a variety of statistical inference problems. In this paper I study a linear regression model with both parameters and hyperparameters. I demonstrate that the evidence approximation for the optimization of regularization constants can be derived in detail from a free energy minimization viewpoint. 1 Ensemble Learning by Free Energy Minimization A new tool has recently been introduced into the field of neural networks and statistical inference. In traditional approaches to neural networks, a single parameter vector w is optimized by maximum likelihood or penalized maximum likelihood. In the Bayesian interpretation, these optimized param...
An empirical evaluation of Bayesian sampling with hybrid Monte Carlo for training neural network classifiers
 Neural Networks
, 1999
"... ..."
Bayesian Regression Filters and the Issue of Priors
"... We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without th ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of overfitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and overfitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors. Keywords: regression, Bayesian method, Kalman filter, approximation, prior selection, radial basis functions, online learning. Running title: Bayesian Regression filter. 1 Introduction Neural network models such as multilayer perceptrons or radial basis function ne...
Bayesian Methods for Neural Networks: Theory and Applications
, 1995
"... this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor
Interpolation Models with Multiple Hyperparameters
, 1997
"... A traditional interpolation model is characterized by the choice of rcg ularizcr applied to the intcrpolant, and the choice of noise model. Typi cally, the rcgularizcr has a single rcgularization constant , and the noise model has a single parameter . The ratio / alone is responsible for de t ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
A traditional interpolation model is characterized by the choice of rcg ularizcr applied to the intcrpolant, and the choice of noise model. Typi cally, the rcgularizcr has a single rcgularization constant , and the noise model has a single parameter . The ratio / alone is responsible for de termining globally all these attributes of the intcrpolant: its 'complexity', 'flexibility', 'smoothness', 'characteristic scale length', and 'characteristic amplitude'. We suggest that interpolation models should be able to cap turc more than just one flavour of simplicity and complexity. Wc describe Bayesian models in which the intcrpolant has a smoothness that varies spatially. We emphasize the importance, in practical implementation, of the concept of 'conditional convexity' when designing models with many hyperparameters.
Gaussian process time series model for life prognosis of metallic structures
 Journal of Intelligent Material Systems and Structures
, 2009
"... ABSTRACT: Al 2024T351 has been modeled using a kernelbased multivariate Gaussian Process approach. The Gaussian Process model projects fatigue affecting input variables to output crack growth by probabilistically inferring the underlying nonlinear relationship between input and output. The Gaussi ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
ABSTRACT: Al 2024T351 has been modeled using a kernelbased multivariate Gaussian Process approach. The Gaussian Process model projects fatigue affecting input variables to output crack growth by probabilistically inferring the underlying nonlinear relationship between input and output. The Gaussian Process approach not only explicitly models the uncertainty due to scatter in material microstructure parameter, but it also implicitly models the loading sequence effect due to variable loading. The loading sequence effect is modeled through the Gaussian Process optimal hyperparameters by using the crack length data observed over the entire domain of spectrum loading. The performance in the crack growth prediction is evaluated for two covariance functions, a radial basis based, anisotropic covariance function and a neural networkbased isotropic covariance function. Furthermore, the performance of different types of scaling, used to scale the input–output data space, is tested. It is found that for the radial basis based anisotropic covariance function with normalized scaling, the prediction error is consistently lower compared to other combinations. In addition, the Gaussian Process model allows determination of the collapse load condition, which is a desirable feature for online health monitoring and prognosis. Key Words: prognosis, fatigue crack growth, 2024T351 aluminum alloy, variable loading, Gaussian Process, covariance function, maximum likelihood optimization, hyperparameters.
Bayesian Multioutput Feedforward Neural Network Comparison: A Conjugate Prior Approach
"... A Bayesian method for the comparison and selection of multioutput feedforward neural network topology, based on the predictive capability, is proposed . As a measure of the prediction fitness potential, an expected utility criterion is considered which is consistently estimated by a samplereuse ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A Bayesian method for the comparison and selection of multioutput feedforward neural network topology, based on the predictive capability, is proposed . As a measure of the prediction fitness potential, an expected utility criterion is considered which is consistently estimated by a samplereuse computation. As opposed to classic pointpredictionbased crossvalidation methods, this expected utility is defined from the logarithmic score of the neural model predictive probability density. It is shown how the advocated choice of a conjugate probability distribution as prior for the parameters of a competing network, allows a consistent approximation of the network posterior predictive density. A comparison of the performances of the proposed method with the performances of usual selection procedures based on classic crossvalidation and informationtheoretic criteria, is performed first on a simulated case study, and then on a wellknown food analysis dataset.
Model Selection For Inverse Problems: Best Choice Of Basis Function And Model Order Selection
"... A complete solution for an inverse problem needs five main steps: choice of basis functions for discretization, determination of the order of the model, estimation of the hyperparameters, estimation of the solution, and finally, caracterisation of the proposed solution. Many works have been done for ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
A complete solution for an inverse problem needs five main steps: choice of basis functions for discretization, determination of the order of the model, estimation of the hyperparameters, estimation of the solution, and finally, caracterisation of the proposed solution. Many works have been done for the three last steps. The two first have been neglected for a while, in part due to the complexity of the problem. However, in many inverse problems, particularly when the number of data is very low, a good choice of the basis functions and a good selection of the order become primordial. In this paper, we first propose a complete solution whithin a Bayesian framework. Then, we apply the proposed method to an inverse elastic electron scattering problem.