Results 1  10
of
33
R.: Towards internetscale multiview stereo
 In: Proceedings of IEEE CVPR
, 2010
"... This paper introduces an approach for enabling existing multiview stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstr ..."
Abstract

Cited by 54 (6 self)
 Add to MetaCart
(Show Context)
This paper introduces an approach for enabling existing multiview stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is formulated as a constrained optimization and solved iteratively. The merging algorithm, designed to be parallel and outofcore, incorporates robust filtering steps to eliminate lowquality reconstructions and enforce global visibility constraints. The approach has been tested on several large datasets downloaded from Flickr.com, including one with over ten thousand images, yielding a 3D reconstruction with nearly thirty million points. 1.
Multiway clustering on relation graphs
 In Proc. of the 7th SIAM Intl. Conf. on Data Mining
, 2006
"... A number of realworld domains such as social networks and ecommerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for perform ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
A number of realworld domains such as social networks and ecommerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks. In this paper, we propose a principled multiway clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities. To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multimodal tensor over an appropriate domain. Our multiway clustering formulation is driven by the objective of capturing the maximal “information ” in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations. This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KLdivergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions. Furthermore, we present a large family of structurally different multiway clustering schemes that preserve various linear summary statistics of the original data. We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting. We also describe an efficient multiway clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods. Empirical results on datasets obtained from realworld domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework. 1
Multiplicative mixture models for overlapping clustering
, 2008
"... The problem of overlapping clustering, where a point is allowed to belong to multiple clusters, is becoming increasingly important in a variety of applications. In this paper, we present an overlapping clustering algorithm based on multiplicative mixture models. We analyze a general setting where ea ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
The problem of overlapping clustering, where a point is allowed to belong to multiple clusters, is becoming increasingly important in a variety of applications. In this paper, we present an overlapping clustering algorithm based on multiplicative mixture models. We analyze a general setting where each component of the multiplicative mixture is from an exponential family, and present an efficient alternating maximization algorithm to learn the model and infer overlapping clusters. We also show that when each component is assumed to be a Gaussian, we can apply the kernel trick leading to nonlinear cluster separators and obtain better clustering quality. The efficacy of the proposed algorithms is demonstrated using experiments on both UCI benchmark datasets and a microarray gene expression dataset. 1
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
, 2004
"... rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these in ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initial results are encouraging, there still remains a large number of similarity functions that are currently unable to adapt to a particular domain. In our research, we attempt to bridge this gap by developing both new learnable similarity functions and methods for their application to particular problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and nonequivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model of edit distance with affine gaps (Gus Copyright c # 2004, American Association for Artificial Intelli
A GameTheoretic Approach to Hypergraph Clustering
, 2009
"... Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using highorder (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a userdefined number of classes, thereby o ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using highorder (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a userdefined number of classes, thereby obtaining the clusters as a byproduct of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a noncooperative multiplayer “clustering game”, whereby the notion of a cluster is equivalent to a classical gametheoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discretetime dynamics to perform this optimization. Experiments are presented which show the superiority of our approach over stateoftheart hypergraph clustering techniques.
A segmentbased approach to clustering multitopic documents
 in Text Mining Workshop, SIAM Datamining Conference
"... Document clustering has been recognized as a central problem in text data management, and it becomes particularly challenging when documents have multiple topics. In this paper we address the problem of multitopic document clustering by leveraging the natural composition of documents in text segmen ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Document clustering has been recognized as a central problem in text data management, and it becomes particularly challenging when documents have multiple topics. In this paper we address the problem of multitopic document clustering by leveraging the natural composition of documents in text segments, which bear one or more topics on their own. We propose a segmentbased document clustering framework, which is designed to induce a classification of documents starting from the identification of cohesive groups of segmentbased portions of the original documents. We empirically give evidence of the significance of our approach on different, large collections of multitopic documents. 1
Cluster ranking with an application to mining mailbox networks
 In ICDM ’06: Proceedings of the Sixth International Conference on Data Mining
, 2006
"... We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters—the integrated cohesion—which ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters—the integrated cohesion—which is applicable to arbitrary weighted networks. We then present CRank: a new cluster ranking algorithm. Given a network with arbitrary pairwise similarity weights, CRank creates a list of overlapping clusters and ranks them by their integrated cohesion. We provide extensive theoretical and empirical analysis of CRank and show that it is likely to have high precision and recall. A main component of CRank is a heuristic algorithm for finding sparse vertex separators. At the core of this algorithm is a new connection between the well known measure of vertex betweenness and multicommodity flow. Our experiments focus on mining mailbox networks. A mailbox network is an egocentric social network, consisting of contacts with whom an individual exchanges email. Ties among contacts are represented by the frequency of their co–occurrence on message headers. CRank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. We demonstrate the effectiveness of CRank on the Enron data set, consisting of 130 mailbox networks. 1
Latent Dirichlet conditional naiveBayes models
 In ICDM
, 2007
"... In spite of the popularity of probabilistic mixture models for latent structure discovery from data, mixture models do not have a natural mechanism for handling sparsity, where each data point only has a few nonzero observations. In this paper, we introduce conditional naiveBayes (CNB) models, whi ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
In spite of the popularity of probabilistic mixture models for latent structure discovery from data, mixture models do not have a natural mechanism for handling sparsity, where each data point only has a few nonzero observations. In this paper, we introduce conditional naiveBayes (CNB) models, which generalize naiveBayes mixture models to naturally handle sparsity by conditioning the model on observed features. Further, we present latent Dirichlet conditional naiveBayes (LDCNB) models, which constitute a family of powerful hierarchical Bayesian models for latent structure discovery from sparse data. The proposed family of models are quite general and can work with arbitrary regular exponential family conditional distributions. We present a variational inference based EM algorithm for
What is a Cluster? Perspectives from Game Theory
"... “Since no paradigm ever solves all the problems it defines and since no two paradigms leave all the same problems unsolved, paradigm debates always involve the question: Which problems is it more significant to have solved?” Thomas S. Kuhn, The Structure of Scientific Revolutions (1962) There is no ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
“Since no paradigm ever solves all the problems it defines and since no two paradigms leave all the same problems unsolved, paradigm debates always involve the question: Which problems is it more significant to have solved?” Thomas S. Kuhn, The Structure of Scientific Revolutions (1962) There is no shortage of clustering algorithms, and recently a new wave of excitement has spread across the machine learning community mainly because of the important development of spectral methods. At the same time, there is also growing interest around fundamental questions pertaining to the very nature of the clustering problem (see, e.g., [17, 1, 28]). Yet, despite the tremendous progress in the field, the clustering problem remains elusive and a satisfactory answer even to the most basic questions is still to come. Upon scrutinizing the relevant literature on the subject, it becomes apparent that the vast majority of the existing approaches deal with a very specific version of the problem, which asks for partitioning the input data into coherent classes. In fact, almost invariably, the problem of clustering is defined as a partitioning problem, and even the classical distinction between hierarchical and partitional algorithms
Modeling regular polysemy: A study on the semantic classification of Catalan adjectives
 Computational Linguistics
, 2012
"... We present a study on the automatic acquisition of semantic classes for Catalan adjectives from distributional and morphological information, with particular emphasis on polysemous adjectives. The aim is to distinguish and characterize broad classes, such as qualitative (gran ‘big’) and relational ( ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We present a study on the automatic acquisition of semantic classes for Catalan adjectives from distributional and morphological information, with particular emphasis on polysemous adjectives. The aim is to distinguish and characterize broad classes, such as qualitative (gran ‘big’) and relational (pulmonar ‘pulmonary’) adjectives, as well as to identify polysemous adjectives such as econòmic (‘economic  cheap’). We specifically aim at modeling regular polysemy, that is, types of sense alternations that are shared across lemmata. To date, both semantic classes for adjectives and regular polysemy have only been sparsely addressed in empirical computational linguistics. Two main specific questions are tackled in this article. First, what is an adequate broad semantic classification for adjectives? We provide empirical support for the qualitative and relational classes as defined in theoretical work, and uncover one type of adjective that has not received enough attention, namely, the eventrelated class. Second, how is regular polysemy best modeled in computational terms? We present two models, and argue that the second one, which models regular polysemy in terms of simultaneous membership to multiple basic classes, is both theoretically and empirically more adequate than the first one, which attempts to identify independent polysemous classes. Our best classifier achieves 69.1 % accuracy, against a 51% baseline. 1.