Results 11  20
of
92
Stratified exponential families: Graphical models and model selection
 Annals of Statistics
, 2001
"... JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JS ..."
Abstract

Cited by 54 (6 self)
 Add to MetaCart
JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
A linear nongaussian acyclic model for causal discovery
 J. Machine Learning Research
, 2006
"... In recent years, several methods have been proposed for the discovery of causal structure from nonexperimental data. Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to ..."
Abstract

Cited by 54 (23 self)
 Add to MetaCart
In recent years, several methods have been proposed for the discovery of causal structure from nonexperimental data. Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuousvalued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have nonGaussian distributions of nonzero variances. The solution relies on the use of the statistical method known as independent component analysis, and does not require any prespecified timeordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear NonGaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data and realworld data.
Learning Bayesian Networks: A unification for discrete and Gaussian domains
 PROCEEDINGS OF ELEVENTH CONFERENCE ON UNCERTAINTY INARTI CIAL INTELLIGENCE
, 1995
"... We examine Bayesian methods for learning Bayesian networks from a combination of prior knowledge and statistical data. In particular, we unify the approaches we presented at last year's conference for discrete and Gaussian domains. We derive a general Bayesian scoring metric, appropriate for both do ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
We examine Bayesian methods for learning Bayesian networks from a combination of prior knowledge and statistical data. In particular, we unify the approaches we presented at last year's conference for discrete and Gaussian domains. We derive a general Bayesian scoring metric, appropriate for both domains. We then use this metric in combination with wellknown statistical facts about the Dirichlet and normal{Wishart distributions to derive our metrics for discrete and Gaussian domains.
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organ ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Nonlinear causal discovery with additive noise models
"... The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuousvalued data linear acyclic causal models with additive noise are often used because these models are well understood and there are wellknown methods to fit them to data. In ..."
Abstract

Cited by 35 (16 self)
 Add to MetaCart
The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuousvalued data linear acyclic causal models with additive noise are often used because these models are well understood and there are wellknown methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that in fact the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the datagenerating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true datagenerating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
Optimization in continuous domains by learning and simulation of Gaussian networks
"... This paper shows how the Gaussian network paradigm can be used to solve optimization problems in continuous domains. Some methods of structure learning from data and simulation of Gaussian networks are applied in the Estimation of Distribution Algorithm (EDA) as well as new methods based on in ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
This paper shows how the Gaussian network paradigm can be used to solve optimization problems in continuous domains. Some methods of structure learning from data and simulation of Gaussian networks are applied in the Estimation of Distribution Algorithm (EDA) as well as new methods based on information theory are proposed. Experimental results are also presented. 1 Estimation of Distribution Algorithms approaches in continuous domains Figure 1 shows a schematic of the EDA approach for continuous domains. We will use x = (x 1 ; : : : ; xn ) to denote individuals, and D l to denote the population of N individuals in the lth generation. Similarly, D Se l will represent the population of the selected Se individuals from D l . In the EDA [9] our interest will be to estimate f(x j D Se ), that is, the joint probability density function over one individual x being among the selected individuals. We denote as f l (x) = f l (x j D Se l 1 ) the joint density of the lth genera...
Utilities as random variables: Density estimation and structure discovery
 In UAI
, 2000
"... Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person’s utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply stati ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person’s utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply statistical density estimation techniques to learn such a density function from a database of partially elicited utility functions. In particular, we define a Bayesian learning framework for this problem, assuming the distribution over utilities is a mixture of Gaussians, where the mixture components represent statistically coherent subpopulations. We can also extend our techniques to the problem of discovering generalized additivity structure in the utility functions in the population. We define a Bayesian model selection criterion for utility function structure and a search procedure over structures. The factorization of the utilities in the learned model, and the generalization obtained from density estimation, allows us to provide robust estimates of utilities using a significantly smaller number of utility elicitation questions. We experiment with our technique on synthetic utility data and on a real database of utility functions in the domain of prenatal diagnosis. 1
Parameter priors for directed acyclic graphical models and the characterization of several probability distributions
 MICROSOFT RESEARCH, ADVANCED TECHNOLOGY DIVISION
, 1999
"... We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distri ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n × n, n ≥ 3, positivedefinite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W11 − W12W −1 is independent 22 W ′ 12 of {W12, W22} for every block partitioning
Inference and Learning in Hybrid Bayesian Networks
, 1998
"... We survey the literature on methods for inference and learning in Bayesian Networks composed of discrete and continuous nodes, in which the continuous nodes have a multivariate Gaussian distribution, whose mean and variance depends on the values of the discrete nodes. We also briefly consider hybrid ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We survey the literature on methods for inference and learning in Bayesian Networks composed of discrete and continuous nodes, in which the continuous nodes have a multivariate Gaussian distribution, whose mean and variance depends on the values of the discrete nodes. We also briefly consider hybrid Dynamic Bayesian Networks, an extension of switching Kalman filters. This report is meant to summarize what is known at a sufficient level of detail to enable someone to implement the algorithms, but without dwelling on formalities.
A characterization of the Dirichlet distribution with application to learning Bayesian networks
 In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence
, 1995
"... We provide a new characterization of the Dirichlet distribution. This characterization implies that under assumptions made by several previous authors for learning belief networks, a Dirichlet prior on the parameters is inevitable. 1 ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
We provide a new characterization of the Dirichlet distribution. This characterization implies that under assumptions made by several previous authors for learning belief networks, a Dirichlet prior on the parameters is inevitable. 1