Results 1  10
of
53
A Bayesian method for the induction of probabilistic networks from data
 Machine Learning
, 1992
"... Abstract. This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computerassisted hypothesis testing, automated scientific discovery, and automated construction of ..."
Abstract

Cited by 1081 (27 self)
 Add to MetaCart
Abstract. This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computerassisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems.
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection o ..."
Abstract

Cited by 266 (46 self)
 Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
Propagation of Probabilities, Means and Variances in Mixed Graphical Association Models
 Journal of the American Statistical Association
, 1992
"... A scheme is presented for modelling and local computation of exact probabilities, means and variances for mixed qualitative and quantitative variables. The models assume that the conditional distribution of the quantitative variables, given the qualitative, is multivariate Gaussian. The computationa ..."
Abstract

Cited by 143 (2 self)
 Add to MetaCart
A scheme is presented for modelling and local computation of exact probabilities, means and variances for mixed qualitative and quantitative variables. The models assume that the conditional distribution of the quantitative variables, given the qualitative, is multivariate Gaussian. The computational architecture is set up by forming a tree of belief universes, and the calculations are then performed by local message passing between universes. The asymmetry between the quantitative and qualitative variables sets some additional limitations for the specification and propagation structure. Approximate methods when these are not appropriately fulfilled are sketched. Lauritzen and Spiegelhalter (1988) showed how to exploit the local structure in the specification of a discrete probability model for fast and efficient computation, thereby paving the way for exploiting probability based models as parts of realistic systems for planning and decision support. The technique was subsequently imp...
WordSense Disambiguation Using Decomposable Models
 In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics
, 1994
"... Most probabilistic classifiers used for wordsense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabili ..."
Abstract

Cited by 138 (19 self)
 Add to MetaCart
Most probabilistic classifiers used for wordsense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabilistic model is presented along with a case study of the performance of models produced in this manner for the disambiguafion of the noun interest. We describe a method for formulating probabilistic models that use multiple contextual features for wordsense disambiguafion, without requiring untested assumptions regarding the form of the model. Using this approach, the joint distribution of all variables is described by only the most systematic variable interactions, thereby limiting the number of parameters to be estimated, supporting computational efficiency, and providing an understanding of the data.
Causality: Models
 Reasoning, and Inference
, 2000
"... This paper explores the role of Directed Acyclic Graphs (DAGs) as a representation of conditional independence relationships. We show that DAGs offer polynomially sound and complete inference mechanisms for inferring conditional independence relationships from a given causal set of such relationship ..."
Abstract

Cited by 103 (15 self)
 Add to MetaCart
This paper explores the role of Directed Acyclic Graphs (DAGs) as a representation of conditional independence relationships. We show that DAGs offer polynomially sound and complete inference mechanisms for inferring conditional independence relationships from a given causal set of such relationships. As a consequence, dseparation, a graphical criterion for identifying independencies in a DAG, is shown to uncover more valid independencies then any other criterion. In addition, we employ the Armstrong property of conditional independence to show that the dependence relationships displayed by a DAG are inherently consistent, i.e. for every DAG D there exists some probability distribution P that embodies all the conditional independencies displayed in D and none other. INTRODUCTION AND SUMMARY OF RESULTS Networks employing Directed Acyclic Graphs (DAGs) have a long and rich tradition, starting with the geneticist Wright (1921). He developed a method called path analysis [Wright, 1934] which later on, became an established representation of causal models in economics [Wold, 1964], sociology [Blalock, 1971] and psychology [Duncan, 1975]. Influence diagrams represent another application of
A characterization of Markov equivalence classes for acyclic digraphs
, 1995
"... Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow e ..."
Abstract

Cited by 92 (7 self)
 Add to MetaCart
Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building Bayesian networks for expert systems. Whereas the undirected graph associated with a dependence model is uniquely determined, there may, however, be many ADGs that determine the same dependence ( = Markov) model. Thus, the family of all ADGs with a given set of vertices is naturally partitioned into Markovequivalence classes, each class being associated with a unique statistical model. Statistical procedures, such as model selection or model averaging, that fail to take into account these equivalence classes, may incur substantial computational or other inefficiencies. Here it is shown that each Markovequivalence class is uniquely determined by a single chain graph, the essential graph, that is itself simultaneously Markov equivalent to all ADGs in the equivalence class. Essential graphs are characterized, a polynomialtime algorithm for their construction is given, and their applications to model selection and other statistical
ANCESTRAL GRAPH MARKOV MODELS
, 2002
"... This paper introduces a class of graphical independence models that is closed under marginalization and conditioning but that contains all DAG independence models. This class of graphs, called maximal ancestral graphs, has two attractive features: there is at most one edge between each pair of verti ..."
Abstract

Cited by 76 (18 self)
 Add to MetaCart
This paper introduces a class of graphical independence models that is closed under marginalization and conditioning but that contains all DAG independence models. This class of graphs, called maximal ancestral graphs, has two attractive features: there is at most one edge between each pair of vertices; every missing edge corresponds to an independence relation. These features lead to a simple parameterization of the corresponding set of distributions in the Gaussian case.
Causal Inference from Graphical Models
, 2001
"... Introduction The introduction of Bayesian networks (Pearl 1986b) and associated local computation algorithms (Lauritzen and Spiegelhalter 1988, Shenoy and Shafer 1990, Jensen, Lauritzen and Olesen 1990) has initiated a renewed interest for understanding causal concepts in connection with modelling ..."
Abstract

Cited by 59 (4 self)
 Add to MetaCart
Introduction The introduction of Bayesian networks (Pearl 1986b) and associated local computation algorithms (Lauritzen and Spiegelhalter 1988, Shenoy and Shafer 1990, Jensen, Lauritzen and Olesen 1990) has initiated a renewed interest for understanding causal concepts in connection with modelling complex stochastic systems. It has become clear that graphical models, in particular those based upon directed acyclic graphs, have natural causal interpretations and thus form a base for a language in which causal concepts can be discussed and analysed in precise terms. As a consequence there has been an explosion of writings, not primarily within mainstream statistical literature, concerned with the exploitation of this language to clarify and extend causal concepts. Among these we mention in particular books by Spirtes, Glymour and Scheines (1993), Shafer (1996), and Pearl (2000) as well as the collection of papers in Glymour and Cooper (1999). Very briefly, but fundamentally,
Chain Graph Models and their Causal Interpretations
 B
, 2001
"... Chain graphs are a natural generalization of directed acyclic graphs (DAGs) and undirected graphs. However, the apparent simplicity of chain graphs belies the subtlety of the conditional independence hypotheses that they represent. There are a number of simple and apparently plausible, but ultim ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
Chain graphs are a natural generalization of directed acyclic graphs (DAGs) and undirected graphs. However, the apparent simplicity of chain graphs belies the subtlety of the conditional independence hypotheses that they represent. There are a number of simple and apparently plausible, but ultimately fallacious interpretations of chain graphs that are often invoked, implicitly or explicitly. These interpretations also lead to awed methods for applying background knowledge to model selection. We present a valid interpretation by showing how the distribution corresponding to a chain graph may be generated as the equilibrium distribution of dynamic models with feedback. These dynamic interpretations lead to a simple theory of intervention, extending the theory developed for DAGs. Finally, we contrast chain graph models under this interpretation with simultaneous equation models which have traditionally been used to model feedback in econometrics. Keywords: Causal model; cha...