Results 1  10
of
20
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organ ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Bayesian Estimation and Testing of Structural Equation Models
 Psychometrika
, 1999
"... The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameter ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those based on the maximum likelihood solution, e.g., output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters of underidentified models, as we illustrate on a simple errorsinvariables model.
Averaging, Maximum Penalized Likelihood and Bayesian Estimation for Improving Gaussian Mixture Probability Density Estimates
 In
, 1996
"... We apply the idea of averaging ensembles of estimators to probability density estimation. ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
We apply the idea of averaging ensembles of estimators to probability density estimation.
On supervised selection of Bayesian networks
 In UAI99
, 1999
"... Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more \focused " predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classi cation data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawid's prequential (predictive sequential) principle. The results demonstrate that the marginal likelihood score does not perform well for supervised model selection, while the best results are obtained by using Dawid's prequential approach.
Reconstruction of gene networks using Bayesian learning and manipulation experiments
 Bioinformatics
, 2004
"... learning and manipulation experiments ..."
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2001
"... This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exh ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of features can be considered irrelevant for the learning process. Thus, we suggest performing this process using only the relevant features. Then, every irrelevant feature is added to the learnt model to obtain an explanatory model for the original database which is our primary goal. A simple and, thus, efficient measure to assess the relevance of the features for the learning process is presented. Additionally, the form of this measure allows us to calculate a relevance threshold to automatically identify the relevant features. The experimental results reported for synthetic and realworld databases show the ability of our proposal to distinguish between relevant and irrelevant features and to accelerate learning; however, still obtaining good explanatory models for the original database.
Comparing Prequential Model Selection Criteria in Supervised Learning of Mixture Models
 Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics
, 2001
"... In this paper we study prequential model selection criteria in supervised learning domains. The main problem with this approach is the fact that the criterion is sensitive to the ordering the data is processed with. We discuss several approaches for addressing the ordering problem, and compare ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
In this paper we study prequential model selection criteria in supervised learning domains. The main problem with this approach is the fact that the criterion is sensitive to the ordering the data is processed with. We discuss several approaches for addressing the ordering problem, and compare empirically their performance in realworld supervised model selection tasks. The empirical results demonstrate that with the prequential approach it is quite easy to find predictive models that are significantly more accurate classifiers than the models found by the standard unsupervised marginal likelihood criterion. The results also suggest that averaging over random orderings may be a more sensible strategy for solving the ordering problem than trying to find the ordering optimizing the prequential model selection criterion. 1
Inexact graph matching using learning and simulation of Bayesian networks. An empirical comparison between different approaches with synthetic data
, 2000
"... Estimation Distribution Algorithms (EDAs) is a quite recent topic in optimisation techniques. Several algorithms and approaches have already been proposed by different authors, but up to now there are very few papers showing their potential and comparing them to other evolutionary computation method ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Estimation Distribution Algorithms (EDAs) is a quite recent topic in optimisation techniques. Several algorithms and approaches have already been proposed by different authors, but up to now there are very few papers showing their potential and comparing them to other evolutionary computation methods and algorithms such as Genetic Algorithms (GAs). A problem such as inexact graph matching is NPhard and requires techniques that approximate to an acceptable solution. This problem arises when a non bijective correspondence is searched between two graphs G1 and G2 . A typical instance of this problem corresponds to the case where G1 is a model of the scene, and G2 is a graph derived from data (e.g. an image of the scene). EDA algorithms are well suited for this type of problems. This paper proposes to use EDA algorithms as a new approach for inexact graph matching. Also, two adaptations of the EDA approach to problems with constraints are described on the form of two techniques to cont...
Incremental Methods for Bayesian Network Learning
 Department de
, 1999
"... In this work we analyze the most relevant, in our opinion, algorithms for learning Bayesian Networks. We analyze methods that use goodnessoffit tests between tentative networks and data. Within this sort of learning algorithms we distinguish batch and incremental methods. Finally, we propose a sys ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
In this work we analyze the most relevant, in our opinion, algorithms for learning Bayesian Networks. We analyze methods that use goodnessoffit tests between tentative networks and data. Within this sort of learning algorithms we distinguish batch and incremental methods. Finally, we propose a system, called BANDOLER, that incrementally learns Bayesian Networks from data and prior knowledge. The incremental fashion of the system allows to modify the learning strategy and to introduce new prior knowledge during the learning process in the light of the already learnt structure. 1 Introduction The aim of this work is twofold. On the one hand, we introduce the state of the art on learning Bayesian networks. It is intended to be a tutorial on the learning methods based on goodnessoffit tests. We present the most significant, in our opinion, learning algorithms found in the literature, as well as the theory they are based on. On the other hand, we propose a research framework. The fiel...