## Content Modeling Using Latent Permutations

### Cached

### Download Links

Citations: | 8 - 5 self |

### BibTeX

@MISC{Chen_contentmodeling,

author = {Harr Chen and S. R. K. Branavan and Regina Barzilay and David R. Karger},

title = {Content Modeling Using Latent Permutations},

year = {}

}

### OpenURL

### Abstract

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods. 1 1.

### Citations

3719 |
Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...he joint marginal distributions of t and π given the document text, while integrating out all remaining hidden parameters: P (t, π, | w). (6) We accomplish this inference task through Gibbs sampling (=-=Geman & Geman, 1984-=-; Bishop, 2006). A Gibbs sampler builds a Markov chain over the hidden variable state space whose stationary distribution is the actual posterior of the joint distribution. Each new sample is drawn fr... |

2366 | Latent Dirichlet Allocation
- Blei, Ng, et al.
(Show Context)
Citation Context ...each word. Their parameters are estimated using approximate inference techniques, such as Gibbs sampling and variational methods. In traditional topic models such as Latent Dirichlet Allocation (LDA; =-=Blei, Ng, & Jordan, 2003-=-; Griffiths & Steyvers, 2004), documents are treated as bags of words where each word receives a separate topic assignment, and words assigned to the same topic are drawn from a shared language model.... |

1027 | Attention, intentions, and the structure of discourse
- Grosz, Sidner
- 1986
(Show Context)
Citation Context ...zilay & Lee, 2004). 3. An example of a domain where the first constraint is violated is dialogue. Texts in such domains follow the stack structure, allowing topics to recur throughout a conversation (=-=Grosz & Sidner, 1986-=-). 2The success of the permutation-based model in these three complementary tasks demonstrates its flexibility and effectiveness, and attests to the versatility of the general document structure indu... |

625 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...s are estimated using approximate inference techniques, such as Gibbs sampling and variational methods. In traditional topic models such as Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003; =-=Griffiths & Steyvers, 2004-=-), documents are treated as bags of words where each word receives a separate topic assignment, and words assigned to the same topic are drawn from a shared language model. While the bag of words repr... |

324 | Learning to order things
- Cohen, Schapire, et al.
- 1999
(Show Context)
Citation Context ...erings. During the second stage, these local decisions are integrated into a global order which maximizes the number of consistent pairwise classifications. Since finding such an ordering is NP-hard (=-=Cohen, Schapire, & Singer, 1999-=-), various approximations are used in practice (Lapata, 2003; Althaus et al., 2004). While these two-step discriminative approaches can effectively leverage information about local transitions, they d... |

298 | Multi-paragraph segmentation of expository text
- Hearst
- 1994
(Show Context)
Citation Context ...ve contiguous sections. Previous approaches have typically relied on lexical cohesion — that is, similarity in word choices within a document subspan — to guide the choice of segmentation boundaries (=-=Hearst, 1994-=-; van Mulbregt, Carp, Gillick, Lowe, & Yamron, 1998; Blei & Moreno, 2001; Utiyama & Isahara, 2001; Galley, McKeown, FoslerLussier, & Jing, 2003; Purver et al., 2006; Malioutov & Barzilay, 2006; Eisens... |

282 |
Remembering - A Study in Experimental and Social Psychology
- Bartlett
- 1932
(Show Context)
Citation Context ...ontiguous block within the document, rather than spread over disconnected sections. The second constraint states that documents from the same domain tend to present similar topics, in similar orders (=-=Bartlett, 1932-=-; Wray, 2002). This constraint guides toward selecting sequences with similar topic ordering, such as placing History before Transportation. While these constraints are not universal across all genres... |

245 | Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation, and User Studies
- Radev, Jing, et al.
- 2000
(Show Context)
Citation Context ...s better than the more complex HTMM model. This observation is consistent with previous work on cross-document alignment and multidocument summarization, which use clustering as their main component (=-=Radev, Jing, & Budzikowska, 2000-=-; Barzilay, McKeown, & Elhadad, 1999). Despite the fact that HTMM captures some dependencies between adjacent paragraphs, it is not sufficiently constrained. Manual examination of the actual topic ass... |

216 | Statistical models for text segmentation
- Beeferman, Berger, et al.
- 1999
(Show Context)
Citation Context ...ments. Thus, we also compare against the segmentation specified by the CitiesEn clean section headings. Metrics Segmentation quality is evaluated using the standard penalty metrics Pk and WindowDiff (=-=Beeferman, Berger, & Lafferty, 1999-=-; Pevzner & Hearst, 2002). Both pass a sliding window over the documents and compute the probability of the words at the end of the windows being improperly segmented with respect to each other. Windo... |

156 | Aggregating inconsistent information: Ranking and clustering
- Ailon, Charikar, et al.
- 2008
(Show Context)
Citation Context ...eralized Mallows Model canonical ordering is in general NPhard. However, recent advances in statistics have produced efficient approximate algorithms with theoretically guaranteed correctness bounds (=-=Ailon, Charikar, & Newman, 2008-=-), and exact methods that are tractable for typical cases (Meilă et al., 2007). More generally, the model presented in this paper assumes two specific global constraints on content structure. While do... |

148 | Slice sampling
- Neal
- 2000
(Show Context)
Citation Context ...ert the cumulative distribution function to sample from this distribution. However, the distribution itself is univariate and unimodal, so we can expect that an MCMC technique such as slice sampling (=-=Neal, 2003-=-) should perform well. In practice, Matlab’s built-in slice sampler provides a robust draw from this distribution. 6 Computational Issues During inference, directly computing document probabilities on... |

130 | Information fusion in the context of multi-document summarization
- Barzilay, McKeown, et al.
- 1999
(Show Context)
Citation Context ...M model. This observation is consistent with previous work on cross-document alignment and multidocument summarization, which use clustering as their main component (Radev, Jing, & Budzikowska, 2000; =-=Barzilay, McKeown, & Elhadad, 1999-=-). Despite the fact that HTMM captures some dependencies between adjacent paragraphs, it is not sufficiently constrained. Manual examination of the actual topic assignments reveals that HTMM often ass... |

123 | Integrating topics and syntax - Griffiths, Steyvers, et al. - 2005 |

114 | Modeling local coherence: An entity-based approach
- Barzilay, Lapata
- 2008
(Show Context)
Citation Context ... our model, because it captures patterns at the level of topic distributions, rather than local discourse constraints. The ordering of the latter has been studied in the past (Karamanis et al., 2004; =-=Barzilay & Lapata, 2008-=-) and these two types of models can be effectively combined to induce a full ordering (Elsner et al., 2007). 6.4.1 Ordering Evaluation Setup Training and Test Data Sets We use the CitiesEn, CitiesFr a... |

114 | A Critique and Improvement of an Evaluation Metric for Text Segmentation
- Pevzner, Hearst
- 2002
(Show Context)
Citation Context ...the segmentation specified by the CitiesEn clean section headings. Metrics Segmentation quality is evaluated using the standard penalty metrics Pk and WindowDiff (Beeferman, Berger, & Lafferty, 1999; =-=Pevzner & Hearst, 2002-=-). Both pass a sliding window over the documents and compute the probability of the words at the end of the windows being improperly segmented with respect to each other. WindowDiff is stricter, and r... |

90 | Catching the Drift: Probabilistic Content Models with Applications to Generation and Summarization
- Barzilay, Lee
- 2004
(Show Context)
Citation Context ... example, that articles about cities typically contain information about History, Economy, and Transportation, and that descriptions of History usually precede those of Transportation. Previous work (=-=Barzilay & Lee, 2004-=-; Elsner, Austerweil, & Charniak, 2007) has demonstrated that content models can be learned from raw unannotated text, and are useful in a variety of text processing tasks such as summarization and in... |

88 | Inferring strategies for sentence ordering in multidocument news summarization
- Barzilay, Elhadad, et al.
- 2002
(Show Context)
Citation Context ...tatistical Discourse Analysis The global constraints encoded by our model are closely related to research in discourse on information ordering, with applications to text summarization and generation (=-=Barzilay, Elhadad, & McKeown, 2002-=-; Lapata, 2003; Karamanis, Poesio, Mellish, & Oberlander, 2004; Elsner et al., 2007). The emphasis of that body of work is on learning ordering constraints from data, with the goal of reordering new t... |

84 | Discourse Segmentation of Multi-Party Conversation - Galley, McKeown, et al. - 2003 |

79 |
Non-null ranking models. i
- Mallows
- 1957
(Show Context)
Citation Context ...s parameter set scales linearly with the number of elements being ordered, making it sufficiently constrained and tractable for inference. We first describe the standard Mallows Model over orderings (=-=Mallows, 1957-=-). The Mallows Model takes two parameters, a canonical ordering σ and a dispersion parameter ρ. It then sets the probability of any other ordering π to be proportional to e−ρd(π,σ) , where d(π, σ) rep... |

78 |
Computer Intensive Methods for Testing Hypothesis: An Introduction
- Noreen
- 1989
(Show Context)
Citation Context ...t of recall. We also present one summary F-score in our results, which is the harmonic mean of recall and precision. Statistical significance in this setup is measured with approximate randomization (=-=Noreen, 1989-=-), a nonparametric test that can be directly applied to nonlinearly computed metrics such as F-score. This test has been used in prior evaluations for information extraction and machine translation (C... |

71 | Modeling online reviews with multi-grain topic models
- Titov, McDonald
- 2008
(Show Context)
Citation Context ...d topic classification. This hypothesis motivated research on models where topic assignment is guided by structural considerations (Purver, Körding, Griffiths, & Tenenbaum, 2006; Gruber et al., 2007; =-=Titov & McDonald, 2008-=-), particularly relationships between the topics of adjacent textual units. Depending on the application, a textual unit may be a sentence, paragraph, or speaker utterance. A common property of these ... |

57 | Unsupervised topic modelling for multi-party spoken discourse
- Purver, Körding, et al.
- 2006
(Show Context)
Citation Context ...dels can be useful for discourse-level tasks such as segmentation and topic classification. This hypothesis motivated research on models where topic assignment is guided by structural considerations (=-=Purver, Körding, Griffiths, & Tenenbaum, 2006-=-; Gruber et al., 2007; Titov & McDonald, 2008), particularly relationships between the topics of adjacent textual units. Depending on the application, a textual unit may be a sentence, paragraph, or s... |

52 | Topic segmentation with an aspect hidden markov model
- Blei, Moreno
- 2001
(Show Context)
Citation Context ... on lexical cohesion — that is, similarity in word choices within a document subspan — to guide the choice of segmentation boundaries (Hearst, 1994; van Mulbregt, Carp, Gillick, Lowe, & Yamron, 1998; =-=Blei & Moreno, 2001-=-; Utiyama & Isahara, 2001; Galley, McKeown, FoslerLussier, & Jing, 2003; Purver et al., 2006; Malioutov & Barzilay, 2006; Eisenstein & Barzilay, 2008). Our model relies on this same notion in determin... |

52 | Mirella.2003.Probabilistic text structuring: Experiments with sentence ordering.In
- Lapata
(Show Context)
Citation Context ...obal constraints encoded by our model are closely related to research in discourse on information ordering, with applications to text summarization and generation (Barzilay, Elhadad, & McKeown, 2002; =-=Lapata, 2003-=-; Karamanis, Poesio, Mellish, & Oberlander, 2004; Elsner et al., 2007). The emphasis of that body of work is on learning ordering constraints from data, with the goal of reordering new text from the s... |

49 | A Statistical Model for Domain Independent Text Segmentation
- Utiyama, Isahara
- 2001
(Show Context)
Citation Context ...— that is, similarity in word choices within a document subspan — to guide the choice of segmentation boundaries (Hearst, 1994; van Mulbregt, Carp, Gillick, Lowe, & Yamron, 1998; Blei & Moreno, 2001; =-=Utiyama & Isahara, 2001-=-; Galley, McKeown, FoslerLussier, & Jing, 2003; Purver et al., 2006; Malioutov & Barzilay, 2006; Eisenstein & Barzilay, 2008). Our model relies on this same notion in determining the language models o... |

46 | The hidden topic markov model
- Gruber, Rosen-Zvi, et al.
- 2007
(Show Context)
Citation Context ..., in the alignment task, we aim to discover paragraphs across different documents that share the same topic. In our experiments, our permutation-based model outperforms the Hidden Topic Markov Model (=-=Gruber, Rosen-Zvi, & Weiss, 2007-=-) by a wide margin — the gap averaged 28% percentage points in F-score. Second, we consider the segmentation task, where the goal is to partition each document into a sequence of topically coherent se... |

45 | Minimum cut model for spoken lecture segmentation
- Malioutov, Barzilay
- 2006
(Show Context)
Citation Context ...mentation boundaries (Hearst, 1994; van Mulbregt, Carp, Gillick, Lowe, & Yamron, 1998; Blei & Moreno, 2001; Utiyama & Isahara, 2001; Galley, McKeown, FoslerLussier, & Jing, 2003; Purver et al., 2006; =-=Malioutov & Barzilay, 2006-=-; Eisenstein & Barzilay, 2008). Our model relies on this same notion in determining the language models of topics, 17but connecting topics across documents and constraining how those topics appear al... |

44 | Cranking: Combining rankings using conditional probability models on permutations
- Lebanon, Lafferty
- 2002
(Show Context)
Citation Context ...A central challenge of the approach we have presented is modeling the distribution over possible topic orderings. For this purpose we use the Generalized Mallows Model (GMM; Fligner & Verducci, 1986; =-=Lebanon & Lafferty, 2002-=-; Meilă, Phadnis, Patterson, & Bilmes, 2007; Klementiev, Roth, & Small, 2008), which exhibits two appealing properties in the context of this task. First, the model concentrates probability mass on so... |

39 | Distance based ranking models - Fligner, Verducci - 1986 |

36 | Fast collapsed Gibbs sampling for latent Dirichlet allocation
- Porteous, Newman, et al.
- 2008
(Show Context)
Citation Context ...riables in the model, in effect reducing the state space of the Markov chain. Collapsed sampling has been previously demonstrated to be effective for LDA and its variants (Griffiths & Steyvers, 2004; =-=Porteous, Newman, Ihler, Asuncion, Smyth, & Welling, 2008-=-; Titov & McDonald, 2008). It is typically preferred over the explicit Gibbs sampling of all the hidden variables, because of the smaller search space and generally shorter mixing time. Our sampler an... |

33 | Bayesian unsupervised topic segmentation
- Eisenstein, Barzilay
- 2008
(Show Context)
Citation Context ...rent segments. The model yields an average Pk measure of 0.231, a 7.9% percentage point improvement over a competitive Bayesian segmentation method that does not take global constraints into account (=-=Eisenstein & Barzilay, 2008-=-). Third, we apply our model to the ordering task, that is, sequencing a held out set of textual units into a coherent document. As with the previous two applications, the difference between our model... |

30 | Sentence alignment for monolingual comparable corpora
- BARZILAY, N
(Show Context)
Citation Context ...r own separate per-document clusters. Previously developed methods for cross-document alignment have been primarily driven by similarity functions that quantify lexical overlap between textual units (=-=Barzilay & Elhadad, 2003-=-; Nelken & Shieber, 2006). These methods do not explicitly model document structure, but they specify some global constraints that guide the search for an optimal alignment. Pairs of textual units are... |

29 | On Some Pitfalls in Automatic Evaluation and Significance Testing for MT
- Riezler, Maxwell
- 2005
(Show Context)
Citation Context ...ric test that can be directly applied to nonlinearly computed metrics such as F-score. This test has been used in prior evaluations for information extraction and machine translation (Chinchor, 1995; =-=Riezler & Maxwell, 2005-=-). Baselines For this task, we compare against two baselines: • Hidden Topic Markov Model (HTMM; Gruber et al., 2007): As explained in Section 2, this model represents topic change between adjacent te... |

28 |
Computing locally coherent discourses
- Althaus, Karamanis, et al.
- 2004
(Show Context)
Citation Context ...oach is taken: first, probabilistic models are used to estimate pairwise sentence ordering preferences; next, these local decisions are combined to produce a consistent global ordering (Lapata, 2003; =-=Althaus, Karamanis, & Koller, 2004-=-). Training data for pairwise models is constructed by considering all pairs of sentences in a document, 5with supervision labels based on how they are actually ordered. Prior work has demonstrated t... |

28 | Automatic evaluation of information ordering: Kendall's tau
- Lapata
- 2006
(Show Context)
Citation Context ...his measure has been widely used for evaluating information ordering (Lapata, 2003; Barzilay & Lee, 2004; Elsner et al., 2007), and has been shown to correlate with human assessments of text quality (=-=Lapata, 2006-=-). Baselines and Model Variants Our ordering method is compared against the original HMM-based content modeling approach of Barzilay and Lee (2004). This baseline delivers state-of-the art performance... |

26 | Consensus ranking under the exponential model - Meilă, Phadnis, et al. - 2007 |

26 | Text segmentation and topic tracking on broadcast news via a hidden markov model approach - Mulbregt, Carp, et al. - 1998 |

25 |
The Handbook of Discourse Analysis
- Schiffrin, Tannen, et al.
- 2001
(Show Context)
Citation Context ...del local constraints on topic organization. This shortcoming is substantial since many discourse constraints described in the literature are global in nature (Graesser, Gernsbacher, & Goldman, 2003; =-=Schiffrin, Tannen, & Hamilton, 2001-=-). In this paper, we introduce a model of content structure that explicitly represents two important global constraints on topic selection. 2 The first constraint posits that each doc1. Code, data set... |

19 | A bottom-up approach to sentence ordering for multi-document summarization
- Bollegala, Okazaki, et al.
- 2010
(Show Context)
Citation Context ...ased on how they are actually ordered. Prior work has demonstrated that a wide range of features are useful in these classification decisions (Lapata, 2003; Karamanis et al., 2004; Ji & Pulman, 2006; =-=Bollegala, Okazaki, & Ishizuka, 2006-=-). For instance, Lapata (2003) has demonstrated that lexical features, such as verb pairs from the input sentences, serve as a proxy for plausible sequences of actions, and thus are effective predicto... |

19 | Towards robust context-sensitive sentence alignment for monolingual corpora
- Nelken, Shieber
- 2006
(Show Context)
Citation Context ...t clusters. Previously developed methods for cross-document alignment have been primarily driven by similarity functions that quantify lexical overlap between textual units (Barzilay & Elhadad, 2003; =-=Nelken & Shieber, 2006-=-). These methods do not explicitly model document structure, but they specify some global constraints that guide the search for an optimal alignment. Pairs of textual units are considered in isolation... |

18 | 2009. Global models of document structure using latent permutations
- Chen, Branavan, et al.
(Show Context)
Citation Context ... structure, and could be used for an even greater variety of applications than we have considered here. Bibliographic Note Portions of this work were previously presented in a conference publication (=-=Chen, Branavan, Barzilay, & Karger, 2009-=-). This article significantly extends our previous work, most notably by introducing a new algorithm for applying our model’s output to the information ordering task (Section 5), and considering new d... |

16 | Unsupervised rank aggregation with distance-based models
- Klementiev, Roth, et al.
- 2002
(Show Context)
Citation Context ... distribution over possible topic orderings. For this purpose we use the Generalized Mallows Model (GMM; Fligner & Verducci, 1986; Lebanon & Lafferty, 2002; Meilă, Phadnis, Patterson, & Bilmes, 2007; =-=Klementiev, Roth, & Small, 2008-=-), which exhibits two appealing properties in the context of this task. First, the model concentrates probability mass on some canonical ordering and small perturbations (permutations) of that orderin... |

15 |
Statistical significance of muc-6 results
- Chinchor
- 1992
(Show Context)
Citation Context ...9), a nonparametric test that can be directly applied to nonlinearly computed metrics such as F-score. This test has been used in prior evaluations for information extraction and machine translation (=-=Chinchor, 1995-=-; Riezler & Maxwell, 2005). Baselines For this task, we compare against two baselines: • Hidden Topic Markov Model (HTMM; Gruber et al., 2007): As explained in Section 2, this model represents topic c... |

15 | Topic modeling: Beyond bag of words - Wallach |

14 | A unified local and global model for discourse coherence
- Elsner, Austerweil, et al.
- 2007
(Show Context)
Citation Context ...s about cities typically contain information about History, Economy, and Transportation, and that descriptions of History usually precede those of Transportation. Previous work (Barzilay & Lee, 2004; =-=Elsner, Austerweil, & Charniak, 2007-=-) has demonstrated that content models can be learned from raw unannotated text, and are useful in a variety of text processing tasks such as summarization and information ordering. However, the expre... |

13 | Evaluating centering-based metrics of coherence for text structuring using a reliably annotated corpus
- Karamanis, Poesio, et al.
- 2004
(Show Context)
Citation Context ...ts encoded by our model are closely related to research in discourse on information ordering, with applications to text summarization and generation (Barzilay, Elhadad, & McKeown, 2002; Lapata, 2003; =-=Karamanis, Poesio, Mellish, & Oberlander, 2004-=-; Elsner et al., 2007). The emphasis of that body of work is on learning ordering constraints from data, with the goal of reordering new text from the same domain. These methods build on the assumptio... |

6 | Sentence ordering with manifold-based classification in multidocument summarization
- Ji, Pulman
- 2006
(Show Context)
Citation Context ...upervision labels based on how they are actually ordered. Prior work has demonstrated that a wide range of features are useful in these classification decisions (Lapata, 2003; Karamanis et al., 2004; =-=Ji & Pulman, 2006-=-; Bollegala, Okazaki, & Ishizuka, 2006). For instance, Lapata (2003) has demonstrated that lexical features, such as verb pairs from the input sentences, serve as a proxy for plausible sequences of ac... |

3 |
Posterior probability for a consensus ordering
- Fligner, Verducci
- 1990
(Show Context)
Citation Context ...bership in the exponential family of distributions; this means that it is particularly amenable to a Bayesian representation, as it admits a natural independent conjugate prior for each parameter ρj (=-=Fligner & Verducci, 1990-=-): GMM0(ρj | vj,0, ν0) ∝ e (−ρjvj,0−log ψj(ρj))ν0 . (4) This prior distribution takes two parameters ν0 and vj,0. Intuitively, the prior states that over ν0 previous trials, the total number of invers... |

2 | Distance based ranking models - Verducci, J - 1986 |

1 | 162 Modeling Using Latent Permutations Meilă - Phadnis, Patterson, et al. - 2007 |