Results 1 
3 of
3
Improved Lexical Acquisition through DPPbased Verb Clustering
"... Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCFDPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1. 1
ADVANCES IN THE THEORY OF DETERMINANTAL POINT PROCESSES
, 2013
"... The theory of determinantal point processes has its roots in work in mathematical physics in the 1960s, but it is only in recent years that it has been developed beyond several specific examples. While there is a rich probabilistic theory, there are still many open questions in this area, and its ap ..."
Abstract
 Add to MetaCart
The theory of determinantal point processes has its roots in work in mathematical physics in the 1960s, but it is only in recent years that it has been developed beyond several specific examples. While there is a rich probabilistic theory, there are still many open questions in this area, and its applications to statistics and machine learning are still largely unexplored. Our contributions are threefold. First, we develop the theory of determinantal point processes on a finite set. While there is a small body of literature on this topic, we offer a new perspective that allows us to unify and extend previous results. Second, we investigate several new kernels. We describe these processes explicitly, and investigate the new discrete distribution which arises from our computations. Finally, we show how the parameters of a determinantal point process over a finite ground set with a symmetric kernel may be computed if infinite samples are available. This algorithm is a vital step towards the
ExpectationMaximization for Learning Determinantal Point Processes
"... A determinantal point process (DPP) is a probabilistic model of set diversity compactly parameterized by a positive semidefinite kernel matrix. To fit a DPP to a given task, we would like to learn the entries of its kernel matrix by maximizing the loglikelihood of the available data. However, log ..."
Abstract
 Add to MetaCart
(Show Context)
A determinantal point process (DPP) is a probabilistic model of set diversity compactly parameterized by a positive semidefinite kernel matrix. To fit a DPP to a given task, we would like to learn the entries of its kernel matrix by maximizing the loglikelihood of the available data. However, loglikelihood is nonconvex in the entries of the kernel matrix, and this learning problem is conjectured to be NPhard [1]. Thus, previous work has instead focused on more restricted convex learning settings: learning only a single weight for each row of the kernel matrix [2], or learning weights for a linear combination of DPPs with fixed kernel matrices [3]. In this work we propose a novel algorithm for learning the full kernel matrix. By changing the kernel parameterization from matrix entries to eigenvalues and eigenvectors, and then lowerbounding the likelihood in the manner of expectationmaximization algorithms, we obtain an effective optimization procedure. We test our method on a realworld product recommendation task, and achieve relative gains of up to 16.5 % in test loglikelihood compared to the naive approach of maximizing likelihood by projected gradient ascent on the entries of the kernel matrix. 1