Results 11  20
of
63
Asymptotic model selection for directed networks with hidden variables
, 1996
"... We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a ..."
Abstract

Cited by 47 (13 self)
 Add to MetaCart
We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a model according to the dimension of its parameters. We argue that the dimension of a Bayesian network with hidden variables is the rank of the Jacobian matrix of the transformation between the parameters of the network and the parameters of the observable variables. We compute the dimensions of several networks including the naive Bayes model with a hidden root node. 1
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Asymptotic Model Selection for Naive Bayesian Networks
 In Proc. of the 18th Conference on Uncertainty in Artificial Intelligence (UAI02
, 2002
"... We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features.
LASSOPatternsearch Algorithm with Application to Ophthalmology and Genomic Data
, 2008
"... The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. ..."
Abstract

Cited by 29 (22 self)
 Add to MetaCart
The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act
On the Dirichlet Prior and Bayesian Regularization
 In Advances in Neural Information Processing Systems 15
, 2002
"... A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from highthroughput data sources. In this paper we examine how Bayesian regularization using a Dirichle ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from highthroughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "tradeoff" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this tradeoff in the sense of predictive accuracy.
Learning Mixtures of Bayesian Networks
 in Cooper & Moral
, 1997
"... We describe a heuristic method for learning mixtures of Bayesian Networks (MBNs) from possibly incomplete data. The considered class of models is mixtures in which each mixture component is a Bayesian network encoding a conditional Gaussian distribution over a fixed set of variables. Some variables ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
We describe a heuristic method for learning mixtures of Bayesian Networks (MBNs) from possibly incomplete data. The considered class of models is mixtures in which each mixture component is a Bayesian network encoding a conditional Gaussian distribution over a fixed set of variables. Some variables may be hidden or otherwise have missing observations. A key idea in our approach is to treat expected data as real data. This allows us to interleave structure and parameter search and to take advantage of closed form approximations for marginal likelihood. In addition, by treating expected data as real data, the search criterion factors by variable, making the search processes more efficient. We evaluate our approach on synthetic and realworld data sets. Keywords : Mixture models, Bayesian networks, structure learning, parameter learning, hidden variables, EM algorithm. 1 Introduction There is growing interest in a class of models for density estimation known as Bayesian networks. In the...
Graphical models and exponential families
 In Proceedings of the 14th Annual Conference on Uncertainty in Arti cial Intelligence (UAI98
, 1998
"... We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, includin ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, including Bayesian networks with several families of local distributions, are curved exponential families (CEFs) and graphical models with hidden variables are stratified exponential families (SEFs). An SEF is a finite union of CEFs satisfying a frontier condition. In addition, we illustrate how one can automatically generate independence and nonindependence constraints on the distributions over the observable variables implied by a Bayesian network with hidden variables. The relevance of these results for model selection is examined. 1
Regression And Time Series Model Selection Using Variants Of The Schwarz Information Criterion
, 1997
"... The Schwarz (1978) information criterion, SIC, is a widelyused tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformati ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
The Schwarz (1978) information criterion, SIC, is a widelyused tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. In this paper, we investigate the derivation for the identification of terms which are discarded as being asymptotically negligible, but which may be significant in small to moderate samplesize applications. We suggest several SIC variants based on the inclusion of these terms. The results of a simulation study show that the variants improve upon the performance of SIC in two important areas of application: multiple linear regression and time series analysis. 1. Introduction One of the most important problems confronting an investigator in statistical modeling is the choice of an appropriate model to characterize the underlyin...
Quantifier elimination for statistical problems
 In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI99
, 1999
"... Recent improvements on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidd ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Recent improvements on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden variables. 2. Comparing graphical models with hidden variables (i.e., model equivalence, inclusion, and overlap). 3. Answering questions about the identification of a model or portion of a model, and about bounds on quantities derived from a model. 4. Determining whether an independence assertion is implied from a given set of independence assertions. We discuss the foundations of quantifier elimination and demonstrate its application to these problems. 1