Results 1  10
of
21
Dependency networks for inference, collaborative filtering, and data visualization
 Journal of Machine Learning Research
"... We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of ..."
Abstract

Cited by 165 (10 self)
 Add to MetaCart
We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of conditional distributions, one for each nodegiven its parents. We identify several basic properties of this representation and describe a computationally e cient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative ltering (the task of predicting preferences), and the visualization of acausal predictive relationships.
A Bayesian Approach to Causal Discovery
, 1997
"... We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that t ..."
Abstract

Cited by 85 (1 self)
 Add to MetaCart
We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraintbased approach uses categorical information about conditionalindependence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraintbased counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structuresboth quantitative and qualitativecan be made. Three, information from several models can be combined to make better inferences and to better ...
On the Dirichlet Prior and Bayesian Regularization
 In Advances in Neural Information Processing Systems 15
, 2002
"... A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from highthroughput data sources. In this paper we examine how Bayesian regularization using a Dirichle ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from highthroughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "tradeoff" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this tradeoff in the sense of predictive accuracy.
A comparison of scientific and engineering criteria for Bayesian model selection
, 1996
"... Given a set of possible models for variables X and a set of possible parameters for each model, the Bayesian “estimate ” of the probability distribution for X given observed data is obtained by averaging over the possible models and their parameters. An oftenused approximation for this estimate is ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Given a set of possible models for variables X and a set of possible parameters for each model, the Bayesian “estimate ” of the probability distribution for X given observed data is obtained by averaging over the possible models and their parameters. An oftenused approximation for this estimate is obtained by selecting a single model and averaging over its parameters. The approximation is useful because it is computationally efficient, and because it provides a model that facilitates understanding of the domain. A common criterion for model selection is the posterior probability of the model. Another criterion for model selection, proposed by San Martini and Spezzafari (1984), is the predictive performance of a model for the next observation to be seen. From the standpoint of domain understanding, both criteria are useful, because one identifies the model that is most likely, whereas the other identifies the model that is the best predictor of the next observation. To highlight the difference, we refer to the posteriorprobability and alternative criteria as the scientific criterion (SC) and engineering criterion (EC), respectively. When we are interested in predicting the next observation, the modelaveraged estimate is at least as good as that produced by EC, which itself is at least as good as the estimate produced by SC. We show experimentally that, for Bayesiannetwork models containing discrete variables only, the predictive performance of the model average can be significantly better than those of single models selected by either criterion, and that differences between models selected by the two criterion can be substantial. Keywords: model selection, model averaging, Bayesian selection criteria
Transferring Prior Information Between Models Using Imaginary Data
, 2001
"... . Bayesian modeling is limited by our ability to formulate prior distributions that adequately represent our actual prior beliefs  a task that is especially difficult for realistic models with many interacting parameters. I show here how a prior distribution formulated for a simpler, more easily ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
. Bayesian modeling is limited by our ability to formulate prior distributions that adequately represent our actual prior beliefs  a task that is especially difficult for realistic models with many interacting parameters. I show here how a prior distribution formulated for a simpler, more easily understood model can be used to modify the prior distribution of a more complex model. This is done by generating imaginary data from the simpler "donor" model, which is conditioned on in the more complex "recipient" model, effectively transferring the donor model's wellspecified prior information to the recipient model. Such prior information transfers are also useful when comparing two complex models for the same data. Bayesian model comparison based on the Bayes factor is very sensitive to the prior distributions for each model's parameters, with the result that the wrong model may be favoured simply because the prior for the right model was not carefully formulated. This problem can be alleviated by modifying each model's prior to potentially incorporate prior information transferred from the other model. I discuss how these techniques can be implemented by simple Monte Carlo and by Markov chain Monte Carlo with annealed importance sampling. Demonstrations on models for twoway contingency tables and on graphical models for categorical data show that prior information transfer can indeed overcome deficiencies in prior specification for complex models.
FirstGeneration Undergraduate Students and the Impacts of the First Year of College: Some Additional Evidence
"... Firstgeneration students are making significant gains towards access in higher education with enrollment numbers increasing over the past decade (Strayhorn, 2006). Yet the literature examining firstgeneration students has primarily focused on three distinct outcome measures: 1) college choice deci ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Firstgeneration students are making significant gains towards access in higher education with enrollment numbers increasing over the past decade (Strayhorn, 2006). Yet the literature examining firstgeneration students has primarily focused on three distinct outcome measures: 1) college choice decisions and aspirations (e.g., Bui, 2002, 2005; Ceja, 2006; Gibbons & Shoffner,
The Similarity of Causal Inference in Experimental and NonExperimental Studies *
"... For nearly as long as the word “correlation ” has been part of statistical parlance, students have been warned that correlation does not prove causation, and that only experimental studies, e.g., randomized clinical trials, can establish the existence of a causal relationship. Over the last few deca ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
For nearly as long as the word “correlation ” has been part of statistical parlance, students have been warned that correlation does not prove causation, and that only experimental studies, e.g., randomized clinical trials, can establish the existence of a causal relationship. Over the last few decades, somewhat of a consensus has emerged between statisticians, computer scientists, and philosophers on how to represent causal claims and connect them to probabilistic relations. One strand of this work studies the conditions under which evidence accumulated from nonexperimental (observational) studies can be used to infer a causal relationship. In this paper, I compare the typical conditions required to infer that one variable is a direct cause of another in observational and experimental studies. I argue that they are essentially the same. 2
and NonExperimental Studies *
"... For nearly as long as the word “correlation ” has been part of statistical parlance, students have been warned that correlation does not prove causation, and that only experimental studies, e.g., randomized clinical trials, can establish the existence of a causal relationship. Over the last few deca ..."
Abstract
 Add to MetaCart
For nearly as long as the word “correlation ” has been part of statistical parlance, students have been warned that correlation does not prove causation, and that only experimental studies, e.g., randomized clinical trials, can establish the existence of a causal relationship. Over the last few decades, somewhat of a consensus has emerged between statisticians, computer scientists, and philosophers on how to represent causal claims and connect them to probabilistic relations. One strand of this work studies the conditions under which evidence accumulated from nonexperimental (observational) studies can be used to infer a causal relationship. In this paper, I compare the typical conditions required to infer that one variable is a direct cause of another in observational and experimental studies. I argue that they are essentially the same. 2
Ranking by Dependence—A Fair Criteria
"... Estimating the dependences between random variables, and ranking them accordingly, is a prevalent problem in machine learning. Pursuing frequentist and informationtheoretic approaches, we first show that the pvalue and the mutual information can fail even in simplistic situations. We then propose ..."
Abstract
 Add to MetaCart
(Show Context)
Estimating the dependences between random variables, and ranking them accordingly, is a prevalent problem in machine learning. Pursuing frequentist and informationtheoretic approaches, we first show that the pvalue and the mutual information can fail even in simplistic situations. We then propose two conditions for regularizing an estimator of dependence, which leads to a simple yet effective new measure. We discuss its advantages and compare it to wellestablished modelselection criteria. Apart from that, we derive a simple constraint for regularizing parameter estimates in a graphical model. This results in an analytical approximation for the optimal value of the equivalent sample size, which agrees very well with the more involved Bayesian approach in our experiments. 1