## Bayesian Networks on Dirichlet Distributed Vectors

### BibTeX

@MISC{Buntine_bayesiannetworks,

author = {Wray Buntine and Lan Du and Petteri Nurmi},

title = {Bayesian Networks on Dirichlet Distributed Vectors},

year = {}

}

### OpenURL

### Abstract

Exact Bayesian network inference exists for Gaussian and multinomial distributions. For other kinds of distributions, approximations or restrictions on the kind of inference done are needed. In this paper we present generalized networks of Dirichlet distributions, and show how, using the two-parameter Poisson-Dirichlet distribution and Gibbs sampling, one can do approximate inference over them. This involves integrating out the probability vectors but leaving auxiliary discrete count vectors in their place. We illustrate the technique by extending standard topic models to “structured ” documents, where the document structure is given by a Bayesian network of Dirichlets. 1

### Citations

593 | Hierarchical Dirichlet processes
- Teh, Jordan, et al.
(Show Context)
Citation Context ... can thus roughly be thought of as the prior data count since variance is O(1/(b + 1)). We perform Gibbs sampling over b using auxiliary variables. First, consider the case where a = 0, discussed in (=-=Teh et al., 2006-=-). Consider the posterior for b, p(n1:J, s1:J,1:K | H, a = 0), proportional to ∏ j:Ej̸=∅ b ∑ i,k si,j,kΓ(b) ( Γ b + Nj + ∑ i,k sj,i,k ) ( Introduce qj ∼ Beta b, Nj + ∑ i,k sj,i,k ) as auxiliary variab... |

323 |
Adaptive rejection sampling for Gibbs sampling
- Gilks, Wild
- 1992
(Show Context)
Citation Context ...ields a joint posterior distribution for qj and b that is easily shown to be log concave, so the second step in the previous case (a = 0) is now replaced by an adaptive regression sampling step in b (=-=Gilks and Wild, 1992-=-). 4 Experiments We extended standard LDA, shown in Figure 1 in two directions, which have been more fully developed and experimented with elsewhere (Du et al., 2010a; Du et al., 2010b). Here we cover... |

259 | The author-topic model for authors and documents
- Rosen-Zvi
(Show Context)
Citation Context ...dicates the total number of words in i. A lower perplexity over unseen documents means better generalization capability. In our experiments, it is computed based on the held-out method introduced in (=-=Rosen-Zvi et al., 2004-=-) with 80% for training and 20% for testing. html 3 It is available at http://nips.djvuzone.org/txt. Figure 4: LDA topics in The Prince. Figure 5: SeqLDA topics in The Prince.5 Conclusion We have sho... |

237 | Gibbs sampling methods for stick-breaking priors
- Ishwaran, James
- 2001
(Show Context)
Citation Context ...obability vectors by introducing small-valued integer vectors instead. Bayesian hierarchical methods often use the two-parameter Poisson-Dirichlet process (PDP), also known as the Pitman-Yor process (=-=Ishwaran and James, 2001-=-). In Section 2, we discuss these models from our perspective and how they can be used in nested Dirichlet modelling. The basic theory comes from (Buntine and Hutter, 2010), some borrowed from (Teh, 2... |

81 | Interpolating between types and tokens by estimating power law generators
- Goldwater, Griffiths, et al.
- 2006
(Show Context)
Citation Context ...dard theory requires some modifications. In language domains, PDPs and DPs are proving useful for full probability modelling of various phenomena including n-gram modelling and smoothing (Teh, 2006b; =-=Goldwater et al., 2006-=-; Mochihashi and Sumita, 2008), dependency models for grammar (Johnson et al., 2007; Wallach et al., 2008), and for data compression (Wood et al., 2009). The PDP-based n-gram models correspond well to... |

59 |
Adaptor grammars: A framework for specifying compositional nonparametric bayesian models
- Johnson, Griffiths, et al.
- 2007
(Show Context)
Citation Context ...g useful for full probability modelling of various phenomena including n-gram modelling and smoothing (Teh, 2006b; Goldwater et al., 2006; Mochihashi and Sumita, 2008), dependency models for grammar (=-=Johnson et al., 2007-=-; Wallach et al., 2008), and for data compression (Wood et al., 2009). The PDP-based n-gram models correspond well to versions of Kneser-Ney smoothing (Teh, 2006b), the state of the art method in appl... |

45 | Shared segmentation of natural scenes using dependent Pitman-Yor processes - Sudderth, Jordan - 2008 |

33 | Bayesian classification with correlation and inheritance
- Hanson, Stutz, et al.
- 1992
(Show Context)
Citation Context ...r of documents, and L denotes the number of words in document i. K is the number of topics. hierarchical clustering, for instance, where previous methods have hierarchically partitioned the features (=-=Hanson et al., 1991-=-). They can also have a significant role in various places in extending standard topic models (Buntine and Jakulin, 2006; Blei et al., 2003), for instance, making documents, topics or components hiera... |

28 | Discrete component analysis
- Buntine, Jakulin
- 2005
(Show Context)
Citation Context ...g, for instance, where previous methods have hierarchically partitioned the features (Hanson et al., 1991). They can also have a significant role in various places in extending standard topic models (=-=Buntine and Jakulin, 2006-=-; Blei et al., 2003), for instance, making documents, topics or components hierarchical. The standard model is in Figure 1. Both the links α → µ and γ → φ are Dirichlet but α and γ are really unknown ... |

15 |
Graphical Association Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...placed on the direction of inference. For instance, mixing multinomial and Gaussian works as long as in all cases of inference the multinomials are strictly non-descendents of the Gaussian variables (=-=Lauritzen, 1989-=-). Extending inference to Monte Carlo or Gibbs sampling, and allowing general purpose samplers dramatically broadens the range of distributions one can allow (Thomas et al., 1992). In this paper we sh... |

13 | A segmented topic model based on the two-parameter Poisson-Dirichlet process - Du, Buntine, et al. - 2010 |

9 | A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation
- Wood, Teh
- 2009
(Show Context)
Citation Context ...ry comes from (Buntine and Hutter, 2010), some borrowed from (Teh, 2006a). Using these tools, in Section 3 networks of probability vectors distributed as Dirichlet and using PDPs, first presented in (=-=Wood and Teh, 2009-=-), are shown along with techniques for their statistical analysis. Their analysis is the main contribution of this paper. Some examples of these networks are embedded in new versions of topic models. ... |

8 | The infinite Markov model
- Mochihashi, Sumita
- 2007
(Show Context)
Citation Context ...e modifications. In language domains, PDPs and DPs are proving useful for full probability modelling of various phenomena including n-gram modelling and smoothing (Teh, 2006b; Goldwater et al., 2006; =-=Mochihashi and Sumita, 2008-=-), dependency models for grammar (Johnson et al., 2007; Wallach et al., 2008), and for data compression (Wood et al., 2009). The PDP-based n-gram models correspond well to versions of Kneser-Ney smoot... |

7 | Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document - Du, Buntine, et al. - 2010 |

6 | A hierarchical bayesian language model based on pitman-yor processes - 2006b |

6 | Bayesian modeling of dependency trees using hierarchical Pitman-Yor priors - Wallach, Hann, et al. - 2008 |

5 | A Bayesian interpretation of interpolated Kneser-Ney - 2006a |

1 |
A Bayesian interpretation of the Poisson-Dirichlet process. Available at: http://arxiv.org/abs/1007.0296v1
- Buntine, Hutter
- 2010
(Show Context)
Citation Context ...n as the Pitman-Yor process (Ishwaran and James, 2001). In Section 2, we discuss these models from our perspective and how they can be used in nested Dirichlet modelling. The basic theory comes from (=-=Buntine and Hutter, 2010-=-), some borrowed from (Teh, 2006a). Using these tools, in Section 3 networks of probability vectors distributed as Dirichlet and using PDPs, first presented in (Wood and Teh, 2009), are shown along wi... |