## Global models of document structure using latent permutations (2009)

### Cached

### Download Links

Venue: | In NAACL’09 |

Citations: | 18 - 4 self |

### BibTeX

@INPROCEEDINGS{Chen09globalmodels,

author = {Harr Chen and S. R. K. Branavan and Regina Barzilay and David R. Karger},

title = {Global models of document structure using latent permutations},

booktitle = {In NAACL’09},

year = {2009},

pages = {371--379}

}

### OpenURL

### Abstract

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be elegantly represented using a distribution over permutations called the generalized Mallows model. Our structureaware approach substantially outperforms alternative approaches for cross-document comparison and single-document segmentation. 1 1

### Citations

703 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...ded in topic modeling approaches, which posit that latent state variables control the generation of words. In earlier topic modeling work such as latent Dirichlet allocation (LDA) (Blei et al., 2003; =-=Griffiths and Steyvers, 2004-=-), documents are treated as bags of words, where each word receives a separate topic assignment; the topic assignments are auxiliary variables to the main task of language modeling. More recent work h... |

180 | Slice sampling
- Neal
- 2000
(Show Context)
Citation Context ...ert the cumulative distribution function to sample from this distribution. However, the distribution itself is univariate and unimodal, so we can expect that an MCMC technique such as slice sampling (=-=Neal, 2003-=-) should perform well. In practice, the MATLAB black-box slice sampler provides a robust draw from this distribution. 6 Experimental Setup Data Sets We evaluate our model on two data sets drawn from t... |

136 | Integrating topics and syntax
- Griffiths, Steyvers, et al.
- 2005
(Show Context)
Citation Context ...rk has attempted to adapt the concepts of topic modeling to more sophisticated representations than a bag of words; they use these representations to impose stronger constraints on topic assignments (=-=Griffiths et al., 2005-=-; Wallach, 2006; Purver et al., 2006; Gruber et al., 2007). These approaches, however, generally model Markovian topic or state transitions, which only capture local dependencies between adjacent word... |

126 | A critique and improvement of an evaluation metric for text segmentation
- Pevzner, Hearst
- 2002
(Show Context)
Citation Context ... task, we take the boundaries at which topics change within a document to be a segmentation of that document. We evaluate using the standard penalty metrics Pk and WindowDiff (Beeferman et al., 1999; =-=Pevzner and Hearst, 2002-=-). Both pass a sliding window over the documents and compute the probability of the words at the ends of the windows being improperly segmented with respect to each other. WindowDiff requires that the... |

93 |
Computer Intensive Methods for Testing Hypotheses. An Introduction
- Noreen
- 1989
(Show Context)
Citation Context ...ic usually correspond to the same section heading. The harmonic mean of recall and precision is the summary F-score. Statistical significance in this setup is measured with approximate randomization (=-=Noreen, 1989-=-), a nonparametric test that can be directly applied to nonlinear metrics such as F-score. This test has been used in prior evaluations for information extraction and machine translation (Chinchor, 19... |

85 | Modeling online reviews with multi-grain topic models
- Titov, McDonald
- 2008
(Show Context)
Citation Context ...plemented as HMMs, where the states correspond to topics of domain-specific information, and transitions reflect pairwise ordering preferences. Even approaches that break text into contiguous chunks (=-=Titov and McDonald, 2008-=-) assign topics based on local context. While these locally constrained models can implicitly reflect some discourse-level constraints, they cannot capture long-range dependencies without an explosion... |

63 | Unsupervised topic modeling for multi-party spoken discourse
- Purver, Kording, et al.
- 2006
(Show Context)
Citation Context ...of topic modeling to more sophisticated representations than a bag of words; they use these representations to impose stronger constraints on topic assignments (Griffiths et al., 2005; Wallach, 2006; =-=Purver et al., 2006-=-; Gruber et al., 2007). These approaches, however, generally model Markovian topic or state transitions, which only capture local dependencies between adjacent words or blocks within a document. For i... |

53 | Hidden Topic Markov Models
- Gruber, Rosen-Zvi, et al.
- 2007
(Show Context)
Citation Context ...more sophisticated representations than a bag of words; they use these representations to impose stronger constraints on topic assignments (Griffiths et al., 2005; Wallach, 2006; Purver et al., 2006; =-=Gruber et al., 2007-=-). These approaches, however, generally model Markovian topic or state transitions, which only capture local dependencies between adjacent words or blocks within a document. For instance, content mode... |

48 | Minimum cut model for spoken lecture segmentation
- Malioutov, Barzilay
- 2006
(Show Context)
Citation Context ...g (Eisenstein and Barzilay, 2008), 10 a Bayesian topic-based segmentation model that outperforms previous segmentation approaches (Utiyama and Isahara, 2001; Galley et al., 2003; Purver et al., 2006; =-=Malioutov and Barzilay, 2006-=-). BayesSeg enforces the topic contiguity constraint that motivated our model. We provide this baseline with the benefit of knowing the correct number of segments for each document, which is not provi... |

45 | Cranking: Combining rankings using conditional probability models on permutations
- Lebanon, Lafferty
- 2002
(Show Context)
Citation Context ...del A central challenge of the approach we take is modeling the distribution over possible topic permutations. For this purpose we use the generalized Mallows model (GMM) (Fligner and Verducci, 1986; =-=Lebanon and Lafferty, 2002-=-; Meilă et al., 2007), which exhibits two appealing properties in the context of this task. First, the model concentrates probability mass on some “canonical” ordering and small perturbations of that ... |

43 | Fast collapsed gibbs sampling for latent dirichlet allocation
- Porteous, Newman, et al.
- 2008
(Show Context)
Citation Context ...ables in the model, in effect reducing the state space of the Markov chain. Collapsed sampling has been previously demonstrated to be effective for LDA and its variants (Griffiths and Steyvers, 2004; =-=Porteous et al., 2008-=-; Titov and McDonald, 2008). Our sampler integrates over all but three sets 7 Multiple permutations can contribute to the probability of a single document’s topic assignments zd, if there are topics t... |

41 | Bayesian unsupervised topic segmentation
- Eisenstein, Barzilay
- 2008
(Show Context)
Citation Context ...stic approach of clustering the paragraphs using the CLUTO toolkit, 9 which uses repeated bisection to maximize a cosine similarity-based objective. For the segmentation task, we compare to BayesSeg (=-=Eisenstein and Barzilay, 2008-=-), 10 a Bayesian topic-based segmentation model that outperforms previous segmentation approaches (Utiyama and Isahara, 2001; Galley et al., 2003; Purver et al., 2006; Malioutov and Barzilay, 2006). B... |

39 |
Distance based ranking models
- Fligner, Verducci
- 1986
(Show Context)
Citation Context ...1 The Generalized Mallows Model A central challenge of the approach we take is modeling the distribution over possible topic permutations. For this purpose we use the generalized Mallows model (GMM) (=-=Fligner and Verducci, 1986-=-; Lebanon and Lafferty, 2002; Meilă et al., 2007), which exhibits two appealing properties in the context of this task. First, the model concentrates probability mass on some “canonical” ordering and ... |

34 | Eric Fosler-Lussier, and Hongyan Jing. 2003. Discourse segmentation of multi-party conversation - Galley, McKeown |

34 | On some pitfalls in automatic evaluation and significance testing in MT
- Riezler, Maxwell
- 2005
(Show Context)
Citation Context ... nonparametric test that can be directly applied to nonlinear metrics such as F-score. This test has been used in prior evaluations for information extraction and machine translation (Chinchor, 1995; =-=Riezler and Maxwell, 2005-=-). For the second task, we take the boundaries at which topics change within a document to be a segmentation of that document. We evaluate using the standard penalty metrics Pk and WindowDiff (Beeferm... |

28 | Consensus ranking under the exponential model
- Meila, Phadnis, et al.
- 2007
(Show Context)
Citation Context ...he approach we take is modeling the distribution over possible topic permutations. For this purpose we use the generalized Mallows model (GMM) (Fligner and Verducci, 1986; Lebanon and Lafferty, 2002; =-=Meilă et al., 2007-=-), which exhibits two appealing properties in the context of this task. First, the model concentrates probability mass on some “canonical” ordering and small perturbations of that ordering. This chara... |

15 |
Statistical significance of MUC-6 results
- Chinchor
- 1995
(Show Context)
Citation Context ...Noreen, 1989), a nonparametric test that can be directly applied to nonlinear metrics such as F-score. This test has been used in prior evaluations for information extraction and machine translation (=-=Chinchor, 1995-=-; Riezler and Maxwell, 2005). For the second task, we take the boundaries at which topics change within a document to be a segmentation of that document. We evaluate using the standard penalty metrics... |

15 |
Topic modeling: beyond bag of words
- Wallach
- 2006
(Show Context)
Citation Context ...t the concepts of topic modeling to more sophisticated representations than a bag of words; they use these representations to impose stronger constraints on topic assignments (Griffiths et al., 2005; =-=Wallach, 2006-=-; Purver et al., 2006; Gruber et al., 2007). These approaches, however, generally model Markovian topic or state transitions, which only capture local dependencies between adjacent words or blocks wit... |

14 | Evaluating centeringbased metrics of coherence for text structuring using a reliably annotated corpus
- Karamanis, Poesio, et al.
- 2004
(Show Context)
Citation Context ...ints. Modeling Ordering Constraints Sentence ordering has been extensively studied in the context of probabilistic text modeling for summarization and generation (Barzilay et al., 2002; Lapata, 2003; =-=Karamanis et al., 2004-=-). The emphasis of that body of work is on learning ordering constraints from data, with the goal of reordering new text from the same domain. Our emphasis, however, is on applications where ordering ... |