## Mixed Cumulative Distribution Networks

### Cached

### Download Links

Citations: | 3 - 1 self |

### BibTeX

@MISC{Silva_mixedcumulative,

author = {Ricardo Silva and Charles Blundell and Yee Whye Teh},

title = {Mixed Cumulative Distribution Networks},

year = {}

}

### OpenURL

### Abstract

Directed acyclic graphs (DAGs) are a popular framework to express multivariate probability distributions. Acyclic directed mixed graphs (ADMGs) are generalizations of DAGs that can succinctly capture much richer sets of conditional independencies, and are especially useful in modeling the effects of latent variables implicitly. Unfortunately, there are currently no parameterizations of general ADMGs. In this paper, we apply recent work on cumulative distribution networks and copulas to propose one general construction for ADMG models. We consider a simple parameter estimation approach, and report some encouraging experimental results. MGs are. Reading off independence constraints from a ADMG can be done with a procedure essentially identical to d-separation (Pearl, 1988, Richardson and Spirtes, 2002). Given a graphical structure, the challenge is to provide a procedure to parameterize models that correspond to the independence constraints of the graph, as illustrated below. Example 1: Bi-directed edges correspond to some hidden common parent that has been marginalized. In the Gaussian case, this has an easy interpretation as constraints in the marginal covariance matrix of the remaining variables. Consider the two graphs below.

### Citations

773 |
Adaptive mixtures of local experts
- Jacobs, Jordan, et al.
- 1991
(Show Context)
Citation Context ...bility table (CPT), as is standard in the Bayesian network literature (Pearl, 1988). For continuous data, we parametrize each univariate conditional density function as a mixture of Gaussian experts (=-=Jacobs et al., 1991-=-): fv(xv |paG(Xv)) = K∑ z=1 withπz;v andµz;v depending onpaG(Xv): πz;vN(xv; µz;v,σ 2 z;v) (10) µz;v(paG(Xv)) = θv0 +θ T vpaG(Xv) (11) πz;v(paG(Xv)) ∝ exp(wv0 +w T v paG(Xv)) We use the bivariate Frank... |

737 | Structural Equations with Latent Variables - Bollen - 1989 |

646 |
An Introduction to Copulas
- Nelsen
- 1999
(Show Context)
Citation Context ...othesis testing of ADMG constraints), it has computational advantages. Our construction is done by exploiting recent work on cumulative distribution networks, CDNs (Huang and Frey, 2008) and copulas (=-=Nelsen, 2007-=-, Kirshner, 2007). The usefulness of such parameterizations can then be put to test via some parameter estimation procedure, which in our case will be based on Bayesian learning with Markov chain Mont... |

639 |
UCI Machine Learning Repository
- Asuncion, Newman
- 2007
(Show Context)
Citation Context ...d) Continuous 11 1599 5.7 7.5 -13.72 -11.25 2.47±0.10⋆ Wine quality (white) Continuous 11 4898 7.3 14.5 -13.76 -12.11 1.65±0.09⋆ mani (2009). We used seven data sets from the UCI data set repository (=-=Frank and Asuncion, 2010-=-). Three of the data sets have only discrete variables, whilst four have just continuous variables. All discrete variables were removed from the continuous data sets, as was one variable from any pair... |

496 |
Causation, Prediction, and Search
- Spirtes, Glymour, et al.
- 1993
(Show Context)
Citation Context ...everal directions for future work. While classical approaches for learning Markov equivalence classes of ADMGs have been developed by means of multiple hypothesis tests of conditional independencies (=-=Spirtes et al., 2000-=-), a model-based approach based on Bayesian or penalized likelihood functions can deliver more robust learning procedures and a more natural way of combining data with structural prior knowledge. ADMG... |

197 |
An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias
- Zellner
- 1962
(Show Context)
Citation Context ...tructured prediction problems. For instance, Silva et al. (2007) introduced some simple models for relational classification inspired by ADMG models and by the link to seemingly unrelated regression (=-=Zellner, 1962-=-). However, efficient ADMG-structured prediction methods and new advanced structural learning procedures will need to be developed. Acknowledgements We thank Thomas Richardson from several useful disc... |

142 | Correlation and Causation - Wright - 1921 |

74 | Ancestral graph Markov models
- Richardson, Spirtes
- 2002
(Show Context)
Citation Context ... approach, and report some encouraging experimental results. MGs are. Reading off independence constraints from a ADMG can be done with a procedure essentially identical to d-separation (Pearl, 1988, =-=Richardson and Spirtes, 2002-=-). Given a graphical structure, the challenge is to provide a procedure to parameterize models that correspond to the independence constraints of the graph, as illustrated below. Example 1: Bi-directe... |

36 | Markov properties for acyclic directed mixed graphs - Richardson - 2003 |

26 |
Multivariate Models and dependence Concepts. Chapman
- Joe
- 1997
(Show Context)
Citation Context ...lustrated in the discrete case. Let each Xi take values in{0,1,2,...}. Recall that the relationship between a CDF and a probabiliy mass function is given by the following inclusion-exclusion formula (=-=Joe, 1997-=-): 1∑ ··· z1=0 P(x1,...,xd) = (3) 1∑ (−1) z1+z2+...zdF(x1 −z1,...,xd −zd), zd=0 for d = |XV|. In the binary case, since qA = P(XA = 0) = P(XA ≤ 0,X V\A ≤ 1) = F(xA = 0,x V\A = 1), one can check that (... |

21 | Generalized Thresholding of Large Covariance Matrices
- Rothman, Levina, et al.
- 2008
(Show Context)
Citation Context ...riations (e.g., mixture models and probit models) have been the most common families exploited in the literature (Richardson and Spirtes, 2002, Silva and Ghahramani, 2009, Khare and Rajaratnam, 2009, =-=Rothman et al., 2009-=-). More recently, important progress has been made in constructing binary ADMG models (Drton and Richardson, 2008, Richardson, 2009, Evans and Richardson, 2010), although it is not clear how to extend... |

18 |
Probabilistic Reasoning in Expert Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...er estimation approach, and report some encouraging experimental results. MGs are. Reading off independence constraints from a ADMG can be done with a procedure essentially identical to d-separation (=-=Pearl, 1988-=-, Richardson and Spirtes, 2002). Given a graphical structure, the challenge is to provide a procedure to parameterize models that correspond to the independence constraints of the graph, as illustrate... |

16 | T.S.: Binary models for marginal independence
- Drton, Richardson
- 2008
(Show Context)
Citation Context ...ature (Richardson and Spirtes, 2002, Silva and Ghahramani, 2009, Khare and Rajaratnam, 2009, Rothman et al., 2009). More recently, important progress has been made in constructing binary ADMG models (=-=Drton and Richardson, 2008-=-, Richardson, 2009, Evans and Richardson, 2010), although it is not clear how to extend such models to infinite discrete spaces (such as treating Poisson random variables) − also important, scalabilit... |

15 | A new algorithm for maximum likelihood estimation in Gaussian graphical models for marginal independence - Richardson - 2003 |

14 | Learning with tree-averaged densities and distributions
- Kirshner
- 2007
(Show Context)
Citation Context ...g of ADMG constraints), it has computational advantages. Our construction is done by exploiting recent work on cumulative distribution networks, CDNs (Huang and Frey, 2008) and copulas (Nelsen, 2007, =-=Kirshner, 2007-=-). The usefulness of such parameterizations can then be put to test via some parameter estimation procedure, which in our case will be based on Bayesian learning with Markov chain Monte Carlo (MCMC) W... |

13 | Hidden common cause relations in relational learning - Silva, Chu, et al. - 2007 |

12 | Cumulative distribution networks and the derivative-sumproduct algorithm
- Huang, Frey
- 2008
(Show Context)
Citation Context ...e in applications such as joint hypothesis testing of ADMG constraints), it has computational advantages. Our construction is done by exploiting recent work on cumulative distribution networks, CDNs (=-=Huang and Frey, 2008-=-) and copulas (Nelsen, 2007, Kirshner, 2007). The usefulness of such parameterizations can then be put to test via some parameter estimation procedure, which in our case will be based on Bayesian lear... |

7 | The hidden life of latent variables: Bayesian learning with mixed graph models
- Silva, Ghahramani
(Show Context)
Citation Context ...t be connected. In the context of Bayesian inference, Markov chain Monte Carlo in ADMGs might have much better mixing properties compared to models where all latent variables are explicitly included (=-=Silva and Ghahramani, 2009-=-). However, it is hard in general to parameterize a likelihood function that obeys the independence constraints encoded in an ADMG. Gaussian likelihood functions and their variations (e.g., mixture mo... |

6 | Causal reasoning with ancestral graphs - Zhang - 2008 |

4 | Cumulative Distribution Networks: Inference, Estimation and Applications of Graphical Models for Cumulative Distribution Functions - Huang - 2009 |

3 |
A Elisseeff. Finding latent causes in causal networks: an efficient approach based on Markov blankets
- Pellet
- 2008
(Show Context)
Citation Context ...ows: First, for continuous data, the training and test data were normalized so that the training set has zero mean and unit standard deviation. Then we find a suitable ADMG using the MBCS* algorithm (=-=Pellet, 2008-=-), using the χ 2 test for discrete data, and partial linear correlations for continuous data, both with p = 0.05. Finally, parameters for both the copula MCDN and the Gaussian/probit model are estimat... |

2 | Maximum likelihood fitting of acyclic directed mixed graphs to binary data - Evans, Richardson - 2010 |

2 | Maximum-likelihood learning of cumulative distribution functions on graphs
- Huang, Jojic
- 2010
(Show Context)
Citation Context ...ameterizations of Gaussian and discrete networks (Richardson and Spirtes, 2002, Drton and Richardson, 2008, Richardson, 2009). The framework of cumulative distribution networks (Huang and Frey, 2008, =-=Huang and Jojic, 2010-=-) introduced new approaches for flexible parameterizations of bidirected models. In this paper, we extended CDNs to the full ADMG case, introducing the most flexible class of parameterizations of ADMG... |

2 | Exact inference and learning for cumulative distribution functions on loopy graphs
- Huang, Jojic, et al.
- 2010
(Show Context)
Citation Context ...uous, ordinal and unbounded discrete variables as well. Finally, in graphs with low tree-widths, probability densities/masses can be computed efficiently by dynamic programming (Huang and Frey, 2008, =-=Huang et al., 2010-=-). To summarize, CDNs provide a restricted family of marginal independence models, but one that has computational, statistical and modeling advantages. Depending on the application, the extra constrai... |

2 |
Wishart distributions for covariance graph models
- Khare, Rajaratnam
- 2009
(Show Context)
Citation Context ...ihood functions and their variations (e.g., mixture models and probit models) have been the most common families exploited in the literature (Richardson and Spirtes, 2002, Silva and Ghahramani, 2009, =-=Khare and Rajaratnam, 2009-=-, Rothman et al., 2009). More recently, important progress has been made in constructing binary ADMG models (Drton and Richardson, 2008, Richardson, 2009, Evans and Richardson, 2010), although it is n... |

2 |
Graphical Models. Oxford University Press. Construction of asymmetric mulJournal of Multivariate Analysis
- Lauritzen
- 1996
(Show Context)
Citation Context ... below. X 1 X 5 X 6 X 7 X 2 X 1 X 2 1 CONTRIBUTION X 3 X 8 X 4 X 3 X 4 Graphical models provide a powerful framework for encoding independence constraints in a multivariate distribution (Pearl, 1988, =-=Lauritzen, 1996-=-). Two of the most common families, the directed acyclic graph (DAG) and the undirected network, have complementary properties. For instance, DAGs are non-monotonic independence models, in the sense t... |

2 | A factorization criterion for acyclic directed mixed graphs
- Richardson
- 2009
(Show Context)
Citation Context ...s, 2002, Silva and Ghahramani, 2009, Khare and Rajaratnam, 2009, Rothman et al., 2009). More recently, important progress has been made in constructing binary ADMG models (Drton and Richardson, 2008, =-=Richardson, 2009-=-, Evans and Richardson, 2010), although it is not clear how to extend such models to infinite discrete spaces (such as treating Poisson random variables) − also important, scalability issues arise, as... |