Results 11  20
of
51
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organ ..."
Abstract

Cited by 54 (7 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Asymptotic Model Selection for Naive Bayesian Networks
 In Proc. of the 18th Conference on Uncertainty in Artificial Intelligence (UAI02
, 2002
"... We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features.
Bayesian Estimation and Testing of Structural Equation Models
 Psychometrika
, 1999
"... The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameter ..."
Abstract

Cited by 44 (10 self)
 Add to MetaCart
The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those based on the maximum likelihood solution, e.g., output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters of underidentified models, as we illustrate on a simple errorsinvariables model.
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
Learning mixtures of DAG models
, 1997
"... We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple searchandscore algorithms are infeasible for a variety of problems, and introduce a feasible approach in which paramet ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple searchandscore algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman–Stutz asymptotic approximation for model posterior probability and (2) the Expectation–Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples. 1
Models and Selection Criteria for Regression and Classification
 Uncertainty in Arificial Intelligence 13
, 1997
"... When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we ca ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (yjx) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and an...
Graphical models and exponential families
 In Proceedings of the 14th Annual Conference on Uncertainty in Arti cial Intelligence (UAI98
, 1998
"... We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, includin ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, including Bayesian networks with several families of local distributions, are curved exponential families (CEFs) and graphical models with hidden variables are stratified exponential families (SEFs). An SEF is a finite union of CEFs satisfying a frontier condition. In addition, we illustrate how one can automatically generate independence and nonindependence constraints on the distributions over the observable variables implied by a Bayesian network with hidden variables. The relevance of these results for model selection is examined. 1
The Posterior Probability of Bayes Nets with Strong Dependences
 Soft Computing
, 1999
"... Stochastic independence is an idealized relationship located at one end of a continuum of values measuring degrees of dependence. Modeling real world systems, we are often not interested in the distinction between exact independence and any degree of dependence, but between weak ignorable and strong ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Stochastic independence is an idealized relationship located at one end of a continuum of values measuring degrees of dependence. Modeling real world systems, we are often not interested in the distinction between exact independence and any degree of dependence, but between weak ignorable and strong substantial dependence. Good models map significant deviance from independence and neglect approximate independence or dependence weaker than a noise threshold. This intuition is applied to learning the structure of Bayes nets from data. We determine the conditional posterior probabilities of structures given that the degree of dependence at each of their nodes exceeds a critical noise level. Deviance from independence is measured by mutual information. Arc probabilities are determined by the amount of mutual information the neighbors contribute to a node, is greater than a critical minimum deviance from independence. A Ø 2 approximation for the probability density function of mutual info...
Dimension Correction for Hierarchical Latent Class Models
, 2002
"... Model complexity is an important factor to consider when selecting among graphical models. When all variables are observed, the complexity of a model can be measured by its standard dimension, i.e. the number of independent parameters. When hidden variables are present, however, standard dime ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
Model complexity is an important factor to consider when selecting among graphical models. When all variables are observed, the complexity of a model can be measured by its standard dimension, i.e. the number of independent parameters. When hidden variables are present, however, standard dimension might no longer be appropriate.
Population Markov Chain Monte Carlo
 Machine Learning
, 2003
"... Stochastic search algorithms inspired by physical and biological systems are applied to the problem of learning directed graphical probability models in the presence of missing observations and hidden variables. For this class of problems, deterministic search algorithms tend to halt at local optima ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Stochastic search algorithms inspired by physical and biological systems are applied to the problem of learning directed graphical probability models in the presence of missing observations and hidden variables. For this class of problems, deterministic search algorithms tend to halt at local optima, requiring random restarts to obtain solutions of acceptable quality. We compare three stochastic search algorithms: a MetropolisHastings Sampler (MHS), an Evolutionary Algorithm (EA), and a new hybrid algorithm called Population Markov Chain Monte Carlo, or popMCMC. PopMCMC uses statistical information from a population of MHSs to inform the proposal distributions for individual samplers in the population. Experimental results show that popMCMC and EAs learn more efficiently than the MHS with no information exchange. Populations of MCMC samplers exhibit more diversity than populations evolving according to EAs not satisfying physicsinspired local reversibility conditions. KEY WORDS: Markov Chain Monte Carlo, MetropolisHastings Algorithm, Graphical Probabilistic Models, Bayesian Networks, Bayesian Learning, Evolutionary Algorithms Machine Learning MCMC Issue 1 5/16/01 1.