Results 11 - 20
of
43
The variable selection problem
- Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Asymptotic Model Selection for Naive Bayesian Networks
- In Proc. of the 18th Conference on Uncertainty in Artificial Intelligence (UAI-02
, 2002
"... We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features.
On the Dirichlet Prior and Bayesian Regularization
- In Advances in Neural Information Processing Systems 15
, 2002
"... A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichle ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "trade-off" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this trade-off in the sense of predictive accuracy.
Latent Variable Models for Neural Data Analysis
, 1999
"... The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 1011 neurons, each making an average of 10 3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 1011 neurons, each making an average of 10 3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis. It is divided
Graphical Models and Exponential Families
, 1998
"... We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, incl ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, including Bayesian networks with several families of local distributions, are curved exponential families (CEFs) and graphical models with hidden variables are stratified exponential families (SEFs). An SEF is a finite union of CEFs satisfying a frontier condition. In addition, we illustrate how one can automatically generate independence and non-independence constraints on the distributions over the observable variables implied by a Bayesian network with hidden variables. The relevance of these results for model selection is examined. 1 Introduction A graphical model is a family of probability distributions. The set of distributions associated with a graphical model are usually define...
Learning Mixtures of Bayesian Networks
- in Cooper & Moral
, 1997
"... We describe a heuristic method for learning mixtures of Bayesian Networks (MBNs) from possibly incomplete data. The considered class of models is mixtures in which each mixture component is a Bayesian network encoding a conditional Gaussian distribution over a fixed set of variables. Some variables ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
We describe a heuristic method for learning mixtures of Bayesian Networks (MBNs) from possibly incomplete data. The considered class of models is mixtures in which each mixture component is a Bayesian network encoding a conditional Gaussian distribution over a fixed set of variables. Some variables may be hidden or otherwise have missing observations. A key idea in our approach is to treat expected data as real data. This allows us to interleave structure and parameter search and to take advantage of closed form approximations for marginal likelihood. In addition, by treating expected data as real data, the search criterion factors by variable, making the search processes more efficient. We evaluate our approach on synthetic and real-world data sets. Keywords : Mixture models, Bayesian networks, structure learning, parameter learning, hidden variables, EM algorithm. 1 Introduction There is growing interest in a class of models for density estimation known as Bayesian networks. In the...
Regression And Time Series Model Selection Using Variants Of The Schwarz Information Criterion
, 1997
"... The Schwarz (1978) information criterion, SIC, is a widely-used tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformati ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
The Schwarz (1978) information criterion, SIC, is a widely-used tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. In this paper, we investigate the derivation for the identification of terms which are discarded as being asymptotically negligible, but which may be significant in small to moderate sample-size applications. We suggest several SIC variants based on the inclusion of these terms. The results of a simulation study show that the variants improve upon the performance of SIC in two important areas of application: multiple linear regression and time series analysis. 1. Introduction One of the most important problems confronting an investigator in statistical modeling is the choice of an appropriate model to characterize the underlyin...
LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data
, 2008
"... The LASSO-Patternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
The LASSO-Patternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act
Embedded Bayesian Network Classifiers
, 1997
"... Low-dimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The mo ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Low-dimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The model for a node Y given parents X is obtained from a (usually different) Bayesian network for Y and X in which X need not be the parents of Y . We show that an EBNC is a special case of a softmax polynomial regression model. Also, we show how to identify a non-redundant set of parameters for an EBNC, and describe an asymptotic approximation for learning the structure of Bayesian networks that contain EBNCs. Unlike the decision tree, decision graph, and causal independence models, we are unaware of a semantic justification for the use of these models. Experiments are needed to determine whether the models presented in this paper are useful in practice. Keywords: Bayesian networks, model dimen...

