Results 1  10
of
41
R.: Towards internetscale multiview stereo
 In: Proceedings of IEEE CVPR
, 2010
"... This paper introduces an approach for enabling existing multiview stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstr ..."
Abstract

Cited by 101 (6 self)
 Add to MetaCart
(Show Context)
This paper introduces an approach for enabling existing multiview stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is formulated as a constrained optimization and solved iteratively. The merging algorithm, designed to be parallel and outofcore, incorporates robust filtering steps to eliminate lowquality reconstructions and enforce global visibility constraints. The approach has been tested on several large datasets downloaded from Flickr.com, including one with over ten thousand images, yielding a 3D reconstruction with nearly thirty million points. 1.
Multiway clustering on relation graphs
 In Proc. of the 7th SIAM Intl. Conf. on Data Mining
, 2006
"... A number of realworld domains such as social networks and ecommerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for perform ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
(Show Context)
A number of realworld domains such as social networks and ecommerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks. In this paper, we propose a principled multiway clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities. To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multimodal tensor over an appropriate domain. Our multiway clustering formulation is driven by the objective of capturing the maximal “information ” in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations. This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KLdivergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions. Furthermore, we present a large family of structurally different multiway clustering schemes that preserve various linear summary statistics of the original data. We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting. We also describe an efficient multiway clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods. Empirical results on datasets obtained from realworld domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework. 1
A GameTheoretic Approach to Hypergraph Clustering
, 2009
"... Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using highorder (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a userdefined number of classes, thereby o ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using highorder (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a userdefined number of classes, thereby obtaining the clusters as a byproduct of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a noncooperative multiplayer “clustering game”, whereby the notion of a cluster is equivalent to a classical gametheoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discretetime dynamics to perform this optimization. Experiments are presented which show the superiority of our approach over stateoftheart hypergraph clustering techniques.
What is a Cluster? Perspectives from Game Theory
"... “Since no paradigm ever solves all the problems it defines and since no two paradigms leave all the same problems unsolved, paradigm debates always involve the question: Which problems is it more significant to have solved?” Thomas S. Kuhn, The Structure of Scientific Revolutions (1962) There is no ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
“Since no paradigm ever solves all the problems it defines and since no two paradigms leave all the same problems unsolved, paradigm debates always involve the question: Which problems is it more significant to have solved?” Thomas S. Kuhn, The Structure of Scientific Revolutions (1962) There is no shortage of clustering algorithms, and recently a new wave of excitement has spread across the machine learning community mainly because of the important development of spectral methods. At the same time, there is also growing interest around fundamental questions pertaining to the very nature of the clustering problem (see, e.g., [17, 1, 28]). Yet, despite the tremendous progress in the field, the clustering problem remains elusive and a satisfactory answer even to the most basic questions is still to come. Upon scrutinizing the relevant literature on the subject, it becomes apparent that the vast majority of the existing approaches deal with a very specific version of the problem, which asks for partitioning the input data into coherent classes. In fact, almost invariably, the problem of clustering is defined as a partitioning problem, and even the classical distinction between hierarchical and partitional algorithms
Multiplicative mixture models for overlapping clustering
, 2008
"... The problem of overlapping clustering, where a point is allowed to belong to multiple clusters, is becoming increasingly important in a variety of applications. In this paper, we present an overlapping clustering algorithm based on multiplicative mixture models. We analyze a general setting where ea ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
The problem of overlapping clustering, where a point is allowed to belong to multiple clusters, is becoming increasingly important in a variety of applications. In this paper, we present an overlapping clustering algorithm based on multiplicative mixture models. We analyze a general setting where each component of the multiplicative mixture is from an exponential family, and present an efficient alternating maximization algorithm to learn the model and infer overlapping clusters. We also show that when each component is assumed to be a Gaussian, we can apply the kernel trick leading to nonlinear cluster separators and obtain better clustering quality. The efficacy of the proposed algorithms is demonstrated using experiments on both UCI benchmark datasets and a microarray gene expression dataset. 1
Cluster ranking with an application to mining mailbox networks
 In ICDM ’06: Proceedings of the Sixth International Conference on Data Mining
, 2006
"... We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters—the integrated cohesion—which ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters—the integrated cohesion—which is applicable to arbitrary weighted networks. We then present CRank: a new cluster ranking algorithm. Given a network with arbitrary pairwise similarity weights, CRank creates a list of overlapping clusters and ranks them by their integrated cohesion. We provide extensive theoretical and empirical analysis of CRank and show that it is likely to have high precision and recall. A main component of CRank is a heuristic algorithm for finding sparse vertex separators. At the core of this algorithm is a new connection between the well known measure of vertex betweenness and multicommodity flow. Our experiments focus on mining mailbox networks. A mailbox network is an egocentric social network, consisting of contacts with whom an individual exchanges email. Ties among contacts are represented by the frequency of their co–occurrence on message headers. CRank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. We demonstrate the effectiveness of CRank on the Enron data set, consisting of 130 mailbox networks. 1
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
, 2004
"... rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these in ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initial results are encouraging, there still remains a large number of similarity functions that are currently unable to adapt to a particular domain. In our research, we attempt to bridge this gap by developing both new learnable similarity functions and methods for their application to particular problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and nonequivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model of edit distance with affine gaps (Gus Copyright c # 2004, American Association for Artificial Intelli
Banded structure in binary matrices
 In KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2008
"... A 0–1 matrix has a banded structure if both rows and columns can be permuted so that the nonzero entries exhibit a staircase pattern of overlapping rows. The concept of banded matrices has its origins in numerical analysis, where entries can be viewed as descriptions between the problem variables; ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
A 0–1 matrix has a banded structure if both rows and columns can be permuted so that the nonzero entries exhibit a staircase pattern of overlapping rows. The concept of banded matrices has its origins in numerical analysis, where entries can be viewed as descriptions between the problem variables; the bandedness corresponds to variables that are coupled over short distances. Banded data occurs also in other applications, for example in the physical mapping problem of the human genome, in paleontological data, in network data and in the discovery of overlapping communities without cycles. We study in this paper the banded structure of binary matrices, give a formal definition of the concept and discuss its theoretical properties. We consider the algorithmic problems of computing how far a matrix is from being banded, and of finding a good submatrix of the original data that exhibits approximate bandedness. Finally, we show by experiments on real data from ecology and other applications the usefulness of the concept. Our results reveal that bands exist in real datasets and that the final obtained ordering of rows and columns have natural interpretations.
A segmentbased approach to clustering multitopic documents
 in Text Mining Workshop, SIAM Datamining Conference
"... Document clustering has been recognized as a central problem in text data management, and it becomes particularly challenging when documents have multiple topics. In this paper we address the problem of multitopic document clustering by leveraging the natural composition of documents in text segmen ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Document clustering has been recognized as a central problem in text data management, and it becomes particularly challenging when documents have multiple topics. In this paper we address the problem of multitopic document clustering by leveraging the natural composition of documents in text segments, which bear one or more topics on their own. We propose a segmentbased document clustering framework, which is designed to induce a classification of documents starting from the identification of cohesive groups of segmentbased portions of the original documents. We empirically give evidence of the significance of our approach on different, large collections of multitopic documents. 1
Overlapping correlation clustering
 In ICDM
, 2011
"... Abstract—We introduce a new approach to the problem of overlapping clustering. The main idea is to formulate overlapping clustering as an optimization problem in which each data point is mapped to a small set of labels, representing membership to different clusters. The objective is to find a mappin ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We introduce a new approach to the problem of overlapping clustering. The main idea is to formulate overlapping clustering as an optimization problem in which each data point is mapped to a small set of labels, representing membership to different clusters. The objective is to find a mapping so that the distances between data points agree as much as possible with distances taken over their label sets. To define distances between label sets, we consider two measures: a setintersection indicator function and the Jaccard coefficient. To solve the main optimization problem we propose a localsearch algorithm. The iterative step of our algorithm requires solving nontrivial optimization subproblems, which, for the measures of setintersection and Jaccard, we solve using a greedy method and nonnegative least squares, respectively. Since our frameworks uses pairwise similarities of objects as the input, it lends itself naturally to the task of clustering structured objects for which feature vectors can be difficult to obtain. As a proof of concept we show how easily our framework can be applied in two different complex application domains. Firstly, we develop overlapping clustering of animal trajectories, obtaining zoologically meaningful results. Secondly, we apply our framework for overlapping clustering of proteins based on pairwise similarities of aminoacid sequences, outperforming the of stateoftheart method in matching a ground truth taxonomy. I.