Results 1  10
of
35
Bayesian Interpolation
 Neural Computation
, 1991
"... Although Bayesian analysis has been in use since Laplace, the Bayesian method of modelcomparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and modelcomparison is demonstrated by studying the inference problem of interpolating noisy data. T ..."
Abstract

Cited by 713 (17 self)
 Add to MetaCart
(Show Context)
Although Bayesian analysis has been in use since Laplace, the Bayesian method of modelcomparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and modelcomparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other problems. Regularising constants are set by examining their posterior probability distribution. Alternative regularisers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. `Occam's razor' is automatically embodied by this framework. The way in which Bayes infers the values of regularising constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling. 1 Data modelling and Occam's razor In science, a central task is to develop and compare models to a...
When Networks Disagree: Ensemble Methods for Hybrid Neural Networks
, 1993
"... This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argu ..."
Abstract

Cited by 347 (3 self)
 Add to MetaCart
(Show Context)
This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argue that the ensemble method presented has several properties: 1) It efficiently uses all the networks of a population  none of the networks need be discarded. 2) It efficiently uses all the available data for training without overfitting. 3) It inherently performs regularization by smoothing in functional space which helps to avoid overfitting. 4) It utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima. 5) It is ideally suited for parallel computation. 6) It leads to a very useful and natural measure of the number of distinct estimators in a population. 7) The optimal parameters of the ensemble estimator are given in clo...
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 193 (12 self)
 Add to MetaCart
(Show Context)
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
A comparison of algorithms for inference and learning in probabilistic graphical models
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handwr ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
(Show Context)
Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition and face detection, it is even more exciting that researchers may be on the verge of introducing computer vision systems that perform scene analysis, decomposing image input into its constituent objects, lighting conditions, motion patterns, and so on. Two of the main challenges in computer vision are finding efficient models of the physics of visual scenes and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graphbased probability models and their associated inference and learning algorithms for computer vision and scene analysis. We review exact techniques and various approximate, computationally efficient techniques, including iterative conditional modes, the expectation maximization (EM) algorithm, the mean field method, variational techniques, structured variational techniques, Gibbs sampling, the sumproduct algorithm and “loopy ” belief propagation. We describe how each technique can be applied in a model of multiple, occluding objects, and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.
Discovering latent classes in relational data
, 2004
"... We present a framework for learning abstract relational knowledge with the aim of explaining how people acquire intuitive theories of physical, biological, or social systems. Our approach is based on a generative relational model with latent classes, and simultaneously determines the kinds of entiti ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
We present a framework for learning abstract relational knowledge with the aim of explaining how people acquire intuitive theories of physical, biological, or social systems. Our approach is based on a generative relational model with latent classes, and simultaneously determines the kinds of entities that exist in a domain, the number of these latent classes, and the relations between classes that are possible or likely. This model goes beyond previous psychological models of category learning, which consider attributes associated with individual categories but not relationships between categories. We apply this domaingeneral framework to two specific problems: learning the structure of kinship systems and learning causal theories. 1 1
Bayesian Inference in MixturesofExperts and Hierarchical MixturesofExperts Models With an Application to Speech Recognition
, 1995
"... Machine classification of acoustic waveforms as speech events is often difficult due to contextdependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixturesofexperts and hierarchical mixtur ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
Machine classification of acoustic waveforms as speech events is often difficult due to contextdependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixturesofexperts and hierarchical mixturesofexperts models. The statistical model underlying the systems is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. A full Bayesian approach is used as a basis of inference and prediction. Computations are performed using Markov chain Monte Carlo methods. A key benefit of this approach is the ability to obtain a sample from the posterior distribution of any functional of the parameters of the given model. In this way, more information is obtained than provided by a point estimate. Also avoided is the need to rely on a normal approximation to the posterior as the basis of inference. This is particularly important in cases wher...
Modular Neural Networks for MAP Classification of Time Series and the Partition Algorithm
, 1996
"... We apply the Partition Algorithm to the problem of time series classification. We assume that the source that generates the time series belongs to a finite set of candidate sources. Classification is based on the computation of posterior probabilities. Prediction error is used to adaptively update t ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
We apply the Partition Algorithm to the problem of time series classification. We assume that the source that generates the time series belongs to a finite set of candidate sources. Classification is based on the computation of posterior probabilities. Prediction error is used to adaptively update the posterior probability of each source. The algorithm is implemented by a hierarchical, modular, recurrent network. The bottom (partition) level of the network consists of neural modules, each one trained to predict the output of one candidate source. The top (decision) level consists of a decision module, which computes posterior probabilities and classifies the time series to the source of maximum posterior probability. The classifier network is formed fi'om the composition of the partition and decision levels. This method applies to deterministic as well as probabilistic time series. Source switching can also be accommodated. We give some examples of application to problems of signal detection, phoneme and enzyme classification. In conclusion, the algorithm presented here gives a systematic method for the design of modular classification networks. The method can be extended by various choices of the partition and decision components.
Finding MixedMemberships in Social Networks
"... This paper addresses the problem of unsupervised group discovery in social networks. We adopt a nonparametric Bayesian framework that extends previous models to networks where the interacting objects can simultaneously belong to several groups (i.e., mixed membership). For this purpose, a hierarchic ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of unsupervised group discovery in social networks. We adopt a nonparametric Bayesian framework that extends previous models to networks where the interacting objects can simultaneously belong to several groups (i.e., mixed membership). For this purpose, a hierarchical nonparametric prior is utilized and inference is performed using Gibbs sampling. The resulting mixedmembership model combines the usual advantages of nonparametric models, such as inference of the total number of groups from the data, and provides a more flexible modeling environment by quantifying the degrees of membership to the various groups. Such models are useful for social information processing because they can capture a user’s multiple interests and hobbies.
An introduction to nonparametric hierarchical bayesian modelling with a focus on multiagent learning
 In Proceedings of the Hamilton Summer School on Switching and Learning in Feedback Systems. Lecture Notes in Computing Science
, 2004
"... ..."
(Show Context)