## Shared Components Topic Models with Application to Selectional Preference

### Cached

### Download Links

Citations: | 2 - 1 self |

### BibTeX

@MISC{Gormley_sharedcomponents,

author = {Matthew R. Gormley and Mark Dredze and Benjamin Van and Durme Jason Eisner},

title = {Shared Components Topic Models with Application to Selectional Preference},

year = {}

}

### OpenURL

### Abstract

Introduction Predicate argument selectional preference1 is the notion that the roles, or argument positions, of a given predicate tend to prefer some arguments to others. Automatically inferring these preferences has been a topic of interest within the computational linguistics community since the early 1990’s, with Resnik [3] giving examples such as: Mary drank some {wine, gasoline, pencils, sadness}, where the provided nouns in the syntactic object position of the verb drink are of

### Citations

2608 | Latent dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ...butions, the classes are constructed as products (the soft variant of conjunctions) and preferences are encoded by mixtures (the soft variant of disjunctions). Model Latent Dirichlet Allocation (LDA) =-=[6]-=- has been used to learn selectional preferences as soft disjunctions over flat semantic classes [7, 8, 9]. Our model, the SCTM, also learns the structure of each class as a soft conjunction of high-le... |

1638 |
Aspects of the Theory of Syntax
- Chomsky
- 1969
(Show Context)
Citation Context ...odels each semantic class as a normalized product of a subset of C underlying semantic features (which we call components). 1 The term selectional preference has a variety of (near) synonyms. Chomsky =-=[1]-=- used the term selectional rules, giving the alternatives: selectional restrictions and restrictions of cooccurrence. Semanticists such as Thomason [2] referred to the problem as sortal (in)correctnes... |

540 | Training products of experts by minimizing contrastive divergence
- Hinton
- 2002
(Show Context)
Citation Context ...rved data X. In the M-step, we find new components φ. Since these are the parameters of the PoEs, we replace the usual maximization of data log-likelihood with a contrastive divergence (CD) objective =-=[15]-=-, popular for PoE training. Normally, CD only estimates the parameters of the product distributions. However, in our model, which features are included in the product change based on the E-step. Since... |

189 | Infinite latent feature models and the indian buffet process
- Griffiths, Ghahramani
- 2005
(Show Context)
Citation Context ...10] model, where the subset of semantic features included in the product is determined by a binary vector generated by a betabernoulli model, the finite counterpart of the Indian Buffet Process (IBP) =-=[11]-=-. p(x|bk, φ) = ∏C c=1 φbkc cx ∑ V v=1 ∏C c=1 φbkc cv Here, φ c is the cth semantic feature, a distribution over words. bk is the binary vector defining the structure of this semantic class. The model ... |

185 |
AMonte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms
- Wei, Tanner
- 1990
(Show Context)
Citation Context ...antic class. The model is closely related to SAGE [12] and the IOMM [13]. Learning To perform parameter estimation, we use an algorithm that follows the outline of the Monte Carlo EM (MCEM) algorithm =-=[14]-=-. In the Monte Carlo E-step, we sample the class assignments zmn and the binary vectors bkc based on current parameters φ and observed data X. In the M-step, we find new components φ. Since these are ... |

150 | Products of experts
- Hinton
- 1999
(Show Context)
Citation Context ...461.40 SCTM 10 80 81.90 432.35 20 20 78.23 615.09 20 40 83.04 423.67 20 80 85.19 334.36 20 160 86.52 283.39 (c) Selectional Preference The kth semantic class in the SCTM is a Product of Experts (PoE) =-=[10]-=- model, where the subset of semantic features included in the product is determined by a binary vector generated by a betabernoulli model, the finite counterpart of the Indian Buffet Process (IBP) [11... |

70 |
Semantic and conceptual development. An ontological perspective
- Keil
- 1979
(Show Context)
Citation Context ...have come to assume. A related intuition can be found in recent work on category learning of Griffiths and colleagues, such as [4], which is partially motivated by the notion of linguistic ontologies =-=[5]-=-. We introduce the Shared Components Topic Model (SCTM), which expresses selectional preferences as soft disjunctions of conjunctions of semantic features. The model assumes there exist underlying sem... |

52 | Semantic classes and syntactic ambiguity
- Resnik
- 1993
(Show Context)
Citation Context ...cate tend to prefer some arguments to others. Automatically inferring these preferences has been a topic of interest within the computational linguistics community since the early 1990’s, with Resnik =-=[3]-=- giving examples such as: Mary drank some {wine, gasoline, pencils, sadness}, where the provided nouns in the syntactic object position of the verb drink are of various levels of semantic acceptabilit... |

50 | A latent Dirichlet allocation method for selectional preferences
- Ritter, Mausam, et al.
- 2010
(Show Context)
Citation Context ... are encoded by mixtures (the soft variant of disjunctions). Model Latent Dirichlet Allocation (LDA) [6] has been used to learn selectional preferences as soft disjunctions over flat semantic classes =-=[7, 8, 9]-=-. Our model, the SCTM, also learns the structure of each class as a soft conjunction of high-level semantic features. Figure 1a provides the generative process for topic modeling and a mapping of term... |

50 | Evaluation methods for topic models
- Wallach, Murray, et al.
- 2009
(Show Context)
Citation Context ...ntation of topics. We use 1,000 randomly selected articles from the 20 Newsgroups dataset. 2 We evaluate the average perplexity per word on held out test data using the left-to-right approximation of =-=[17]-=-. Figure 1b shows the results for LDA and the SCTM for the same number of components and varying K (SCTM). For LDA (K=C), this is a single (dashed) line. For SCTM, the x markers each correspond to a d... |

31 | Sparse additive generative models of text
- Eisenstein, Ahmed, et al.
(Show Context)
Citation Context ...cx ∑ V v=1 ∏C c=1 φbkc cv Here, φ c is the cth semantic feature, a distribution over words. bk is the binary vector defining the structure of this semantic class. The model is closely related to SAGE =-=[12]-=- and the IOMM [13]. Learning To perform parameter estimation, we use an algorithm that follows the outline of the Monte Carlo EM (MCEM) algorithm [14]. In the Monte Carlo E-step, we sample the class a... |

29 | New tools for web-scale n-grams
- Lin, Church, et al.
- 2010
(Show Context)
Citation Context ...MC) approach. Experiments We present results on two tasks: selectional preference and topic modeling. For selectional preference, our data comes from the part-of-speech (POS) tagged n-grams corpus of =-=[16]-=-. Using POS tag patterns we produce a corpus of selectional preference examples of the form (verbdependency type, noun). For example, the pattern VBD (PRP$|DT) NN would match sold the car. Figure 1c p... |

20 | Latent variable models of selectional preference
- Séaghdha
- 2010
(Show Context)
Citation Context ... are encoded by mixtures (the soft variant of disjunctions). Model Latent Dirichlet Allocation (LDA) [6] has been used to learn selectional preferences as soft disjunctions over flat semantic classes =-=[7, 8, 9]-=-. Our model, the SCTM, also learns the structure of each class as a soft conjunction of high-level semantic features. Figure 1a provides the generative process for topic modeling and a mapping of term... |

13 | A Nonparametric Bayesian Approach to Modeling Overlapping Clusters
- Heller, Ghahramani
- 2007
(Show Context)
Citation Context ...φbkc cv Here, φ c is the cth semantic feature, a distribution over words. bk is the binary vector defining the structure of this semantic class. The model is closely related to SAGE [12] and the IOMM =-=[13]-=-. Learning To perform parameter estimation, we use an algorithm that follows the outline of the Monte Carlo EM (MCEM) algorithm [14]. In the Monte Carlo E-step, we sample the class assignments zmn and... |

4 |
A Semantic Theory of Sortal Incorrectness
- Thomason
- 1972
(Show Context)
Citation Context ...erence has a variety of (near) synonyms. Chomsky [1] used the term selectional rules, giving the alternatives: selectional restrictions and restrictions of cooccurrence. Semanticists such as Thomason =-=[2]-=- referred to the problem as sortal (in)correctness, with Thomason providing variants including: category mistake, selectionally incorrect, type crossing, semi-grammatical, and semantically anomalous. ... |

2 | A nonparametric Bayesian model of multi-level category learning
- Canini, Griffiths
- 2011
(Show Context)
Citation Context ...lso allowed for the gradedness that contemporary computational linguists have come to assume. A related intuition can be found in recent work on category learning of Griffiths and colleagues, such as =-=[4]-=-, which is partially motivated by the notion of linguistic ontologies [5]. We introduce the Shared Components Topic Model (SCTM), which expresses selectional preferences as soft disjunctions of conjun... |

2 | Topic models for corpus-centric knowledge generalization
- Durme, Gildea
- 2009
(Show Context)
Citation Context ... are encoded by mixtures (the soft variant of disjunctions). Model Latent Dirichlet Allocation (LDA) [6] has been used to learn selectional preferences as soft disjunctions over flat semantic classes =-=[7, 8, 9]-=-. Our model, the SCTM, also learns the structure of each class as a soft conjunction of high-level semantic features. Figure 1a provides the generative process for topic modeling and a mapping of term... |