## Nonparametric Tree Graphical Models via Kernel Embeddings

### Cached

### Download Links

Citations: | 3 - 1 self |

### BibTeX

@MISC{Song_nonparametrictree,

author = {Le Song and Arthur Gretton and Carlos Guestrin},

title = {Nonparametric Tree Graphical Models via Kernel Embeddings},

year = {}

}

### OpenURL

### Abstract

We introduce a nonparametric representation for graphical model on trees which expresses marginals as Hilbert space embeddings and conditionals as embedding operators. This formulation allows us to define a graphical model solely on the basis of the feature space representation of its variables. Thus, this nonparametric model can be applied to general domains where kernels are defined, handling challenging cases such as discrete variables with huge domains, or very complex, non-Gaussian continuous distributions. We also derive kernel belief propagation, a Hilbert-space algorithm for performing inference in our model. We show that our method outperforms state-of-the-art techniques in a cross-lingual document retrieval task and a camera rotation estimation problem. 1

### Citations

7311 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...tructured data. In addition to our kernel representation of the node marginals and conditionals, we also derive kernel belief propagation, a Hilbert space version of the belief propagation algorithm (=-=Pearl, 1988-=-) required for efficient inference. Unlike Ihler & McAllester (2009), for example, who assume that the node and edge potentials are given in advance and that they can sample from certain distributions... |

459 | Max-margin markov networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ... In the context of graphical models, kernels have thus far been used to represent complex structure in observed variables, but the relationship between unobserved variables has remained simple (c.f., =-=Taskar et al., 2004-=-). In contrast, we define a new representation for graphical model on trees, based on RKHS embeddings of both the node marginals and the conditionals associated with edges between variables. We make u... |

427 | A program for aligning sentences in bilingual corpora
- Gale, Church
- 1993
(Show Context)
Citation Context ...h against two baselines. The first was to use document length directly as a sole feature for retrieval, since length will be roughly retained across languages (see also the sentence alignment work of =-=Gale & Church, 1991-=-). Since some languages are less terse than others, we normalized the within-language document length to zero mean and unit variance. 771Nonparametric Tree Graphical Models via Kernel Embeddings As o... |

327 | Expectation Propagation for Approximate Bayesian Inference - Minka - 2001 |

292 | Europarl: A Parallel Corpus for Statistical Machine Translation
- Koehn
- 2005
(Show Context)
Citation Context ...ons from other languages. retrieval experiments, the objective was to retrieve a document in one language given a source document in another. Our data were the proceedings of the European Parliament (=-=Koehn, 2005-=-). We chose 300 longest documents written in 9 languages, namely Danish (da), Dutch (nl), English (en), French (fr), German (de), Italian (it), Portuguese (pt), Spanish (es), and Swedish (sv). For thi... |

222 | Nonparametric belief propagation
- Sudderth, Ihler, et al.
- 2003
(Show Context)
Citation Context ...message passing algorithms on graphical models, however these rely on either a mixture model (Sudderth et al., 2003) or a 765Nonparametric Tree Graphical Models via Kernel Embeddings set of samples (=-=Ihler & McAllester, 2009-=-) to represent the messages, and thus do not easily generalize to high dimensions or structured data. In addition to our kernel representation of the node marginals and conditionals, we also derive ke... |

120 | Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces
- Fukumizu, Bach, et al.
- 2004
(Show Context)
Citation Context ...Tree Graphical Models via Kernel Embeddings This can be understood by analogy with the finite dimensional case: if x ∈ Rd and y, z are vectors in Rd′ , then (x y⊤ )z = x(y⊤z). Following (Baker, 1973; =-=Fukumizu et al., 2004-=-), we define the uncentered covariance operator CXX : F → F such that for all f ∈ F, 〈f, CXXf〉 F = EX 〈f, ϕ(X) ⊗ ϕ(X)f〉 F = EX [ (f(X)) 2] ; and the uncentered cross-covariance operator CXY : G → F, w... |

99 |
Reproducing Kernel Hilbert Spaces in Probability and Statistics
- Berlinet, Thomas-Agnan
- 2003
(Show Context)
Citation Context ...n for graphical model on trees, based on RKHS embeddings of both the node marginals and the conditionals associated with edges between variables. We make use of recent work in formulating embeddings (=-=Berlinet & Thomas-Agnan, 2003-=-; Gretton et al., 2007; Smola et al., 2007; Sriperumbudur et al., 2008) and conditional embeddings (Song et al., 2009) of probabilities into reproducing kernel Hilbert spaces. These embeddings charact... |

99 | Measuring statistical dependence with hilbert-schmidt norms
- Gretton, Bousquet, et al.
- 2005
(Show Context)
Citation Context ...h document using multiple input documents in other languages. We repeated the experiment on a third tree, obtained from the Chow-Liu algorithm using the Hilbert-Schmidt Independence Criterion (HSIC) (=-=Gretton et al., 2005-=-) for the required statistical dependence measure. Again we retrieved English documents conditioned on documents from other languages. See the supplementary material for details of this third experime... |

92 |
An introduction to probabilistic graphical models
- Jordan
- 2003
(Show Context)
Citation Context ... After pruning, this node will now be a leaf, and the result of inference will be unchanged. 2.2 Belief propagation The expression (2) can be computed efficiently by belief propagation (Pearl, 1988);(=-=Jordan, 2002-=-, Ch. 3,4). This is done by passing messages mts from nodes t to s, starting from the leaves and progressing up to the root. Messages to the target node s are functions on the state space of Xs, and a... |

68 |
A kernel method for the two-sample-problem
- Gretton, Borgwardt, et al.
- 2007
(Show Context)
Citation Context ... based on RKHS embeddings of both the node marginals and the conditionals associated with edges between variables. We make use of recent work in formulating embeddings (Berlinet & Thomas-Agnan, 2003; =-=Gretton et al., 2007-=-; Smola et al., 2007; Sriperumbudur et al., 2008) and conditional embeddings (Song et al., 2009) of probabilities into reproducing kernel Hilbert spaces. These embeddings characterize the probabilitie... |

57 | A Hilbert space embedding for distributions
- Smola, Gretton, et al.
- 2007
(Show Context)
Citation Context ...ngs of both the node marginals and the conditionals associated with edges between variables. We make use of recent work in formulating embeddings (Berlinet & Thomas-Agnan, 2003; Gretton et al., 2007; =-=Smola et al., 2007-=-; Sriperumbudur et al., 2008) and conditional embeddings (Song et al., 2009) of probabilities into reproducing kernel Hilbert spaces. These embeddings characterize the probabilities solely on the basi... |

52 | Kernel measures of conditional dependence
- Fukumizu, Gretton, et al.
(Show Context)
Citation Context ...ct space F n and functions ai ∈ F and bi ∈ F for i ∈ {1, . . . , n}, 〈 n⊗ n⊗ 〉 n∏ ai, = 〈ai, bi〉 F . (1) We use the result i=1 i=1 bi F n i=1 A −1 − B −1 = A −1 (B − A)B −1 . (2) 1Further, following =-=[2]-=-, we may define the empirical regularized correlation operator ˆ VXY such that ( ) 1/2 ( ) 1/2 ĈXY := ĈXX + λmI ˆVXY ĈY Y + λmI . (3) where we have ∥ˆ VXY ∥ ≤ 1. 1.2 Proof of Theorem 1 We now prove th... |

49 | Polylingual topic models
- Mimno, Wallach, et al.
- 2009
(Show Context)
Citation Context ...ment length to zero mean and unit variance. 771Nonparametric Tree Graphical Models via Kernel Embeddings As our second point of comparison, we employed a polylingual topic model for document tuples (=-=Mimno et al., 2009-=-). We learned a bilingual topic model for each edge based on the training document pairs (with stemming and stopword removal). We trained each bilingual topic model with 50 topics using Gibbs sampling... |

44 | Joint measures and cross-covariance operators
- Baker
- 1973
(Show Context)
Citation Context ...onparametric Tree Graphical Models via Kernel Embeddings This can be understood by analogy with the finite dimensional case: if x ∈ Rd and y, z are vectors in Rd′ , then (x y⊤ )z = x(y⊤z). Following (=-=Baker, 1973-=-; Fukumizu et al., 2004), we define the uncentered covariance operator CXX : F → F such that for all f ∈ F, 〈f, CXXf〉 F = EX 〈f, ϕ(X) ⊗ ϕ(X)f〉 F = EX [ (f(X)) 2] ; and the uncentered cross-covariance ... |

36 | Injective hilbert space embeddings of probability measures
- Sriperumbudur, Gretton, et al.
- 2008
(Show Context)
Citation Context ... marginals and the conditionals associated with edges between variables. We make use of recent work in formulating embeddings (Berlinet & Thomas-Agnan, 2003; Gretton et al., 2007; Smola et al., 2007; =-=Sriperumbudur et al., 2008-=-) and conditional embeddings (Song et al., 2009) of probabilities into reproducing kernel Hilbert spaces. These embeddings characterize the probabilities solely on the basis of their feature space rep... |

29 | Hilbert space embeddings of conditional distributions
- Song, Huang, et al.
- 2009
(Show Context)
Citation Context ...tween variables. We make use of recent work in formulating embeddings (Berlinet & Thomas-Agnan, 2003; Gretton et al., 2007; Smola et al., 2007; Sriperumbudur et al., 2008) and conditional embeddings (=-=Song et al., 2009-=-) of probabilities into reproducing kernel Hilbert spaces. These embeddings characterize the probabilities solely on the basis of their feature space representations, thus allowing us to deal easily w... |

17 | Statistical consistency of kernel canonical correlation analysis
- Fukumizu, Bach, et al.
(Show Context)
Citation Context ...olumns of Φ contain the mappings φ(yi). Note that we have added an additional regularizing term λ, to avoid overfitting (in much the same way as is done for kernel canonical correlation analysis: see =-=Fukumizu et al., 2007-=-). A novel result of the present work is the consistency of this estimator, which is proved in the supplementary material. − 3 2 Theorem 1 Assume CY XCXX is Hilbert-Schmidt.3 Then ‖ÛY |X − UY |X‖HS = ... |