Results 1  10
of
111
Correlation Clustering
 MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract

Cited by 331 (4 self)
 Add to MetaCart
(Show Context)
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract

Cited by 188 (4 self)
 Add to MetaCart
(Show Context)
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters.
Efficient algorithms for geometric optimization
 ACM Comput. Surv
, 1998
"... We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear progra ..."
Abstract

Cited by 121 (12 self)
 Add to MetaCart
We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear programming and related problems, and LPtype problems and their efficient solution. We then describe a variety of applications of these and other techniques to numerous problems in geometric optimization, including facility location, proximity problems, statistical estimators and metrology, placement and intersection of polygons and polyhedra, and ray shooting and other querytype problems.
A Fast MultiScale Method for Drawing Large Graphs
 JOURNAL OF GRAPH ALGORITHMS AND APPLICATIONS
, 2002
"... We present a multiscale layout algorithm for the aesthetic drawing of undirected graphs with straightline edges. The algorithm is extremely fast, and is capable of drawing graphs that are substantially larger than those we have encountered in prior work. For example, the paper contains a drawi ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
(Show Context)
We present a multiscale layout algorithm for the aesthetic drawing of undirected graphs with straightline edges. The algorithm is extremely fast, and is capable of drawing graphs that are substantially larger than those we have encountered in prior work. For example, the paper contains a drawing of a graph with over 15,000 vertices. Also we achieve "nice" drawings of 1000 vertex graphs in about 1 second. The proposed algorithm embodies a new multiscale scheme for drawing graphs, which was motivated by the earlier multiscale algorithm of Hadany and Harel [HH99]. In principle, it could significantly improve the speed of essentially any forcedirected method (regardless of that method's ability of drawing weighted graphs or the continuity of its costfunction).
Algorithms for Facility Location Problems with Outliers (Extended Abstract)
 In Proceedings of the 12th Annual ACMSIAM Symposium on Discrete Algorithms
, 2000
"... ) Moses Charikar Samir Khuller y David M. Mount z Giri Narasimhan x Abstract Facility location problems are traditionally investigated with the assumption that all the clients are to be provided service. A significant shortcoming of this formulation is that a few very distant clients, called outlier ..."
Abstract

Cited by 90 (9 self)
 Add to MetaCart
) Moses Charikar Samir Khuller y David M. Mount z Giri Narasimhan x Abstract Facility location problems are traditionally investigated with the assumption that all the clients are to be provided service. A significant shortcoming of this formulation is that a few very distant clients, called outliers, can exert a disproportionately strong influence over the final solution. In this paper we explore a generalization of various facility location problems (Kcenter, Kmedian, uncapacitated facility location etc) to the case when only a specified fraction of the customers are to be served. What makes the problems harder is that we have to also select the subset that should get service. We provide generalizations of various approximation algorithms to deal with this added constraint. 1 Introduction The facility location problem and the related clustering problems, kmedian and kcenter, are widely studied in operations research and computer science [3, 7, 22, 24, 32]. Typically in...
Approximation Algorithms for Geometric Median Problems
, 1992
"... In this paper we present approximation algorithms for median problems in metric spaces and fixeddimensional Euclidean space. Our algorithms use a new method for transforming an optimal solution of the linear program relaxation of the smedian problem into a provably good integral solution. This ..."
Abstract

Cited by 81 (0 self)
 Add to MetaCart
(Show Context)
In this paper we present approximation algorithms for median problems in metric spaces and fixeddimensional Euclidean space. Our algorithms use a new method for transforming an optimal solution of the linear program relaxation of the smedian problem into a provably good integral solution. This transformation technique is fundamentally different from the methods of randomized and deterministic rounding [Rag, RaT] and the methods proposed in [LiV] in the following way: Previous techniques never set variables with zero values in the fractional solution to 1. This departure from previous methods is crucial for the success of our algorithms.
How to Allocate Network Centers
 J. Algorithms
, 1992
"... This paper deals with the issue of allocating and utilizing centers in a distributed network, in its various forms. The paper discusses the significant parameters of center allocation, defines the resulting optimization problems, and proposes several approximation algorithms for selecting centers ..."
Abstract

Cited by 77 (4 self)
 Add to MetaCart
(Show Context)
This paper deals with the issue of allocating and utilizing centers in a distributed network, in its various forms. The paper discusses the significant parameters of center allocation, defines the resulting optimization problems, and proposes several approximation algorithms for selecting centers and for distributing the users among them. We concentrate mainly on balanced versions of the problem, i.e., in which it is required that the assignment of clients to centers be as balanced as possible. The main results are constant ratio approximation algorithms for the balanced centers and balanced weighted centers problems, and logarithmic ratio approximation algorithms for the aedominating set and the ktolerant set problems. School of Library and Information, The Hebrew University, Jerusalem 9xxxx, Israel. This work was carried out while the author was with the Department of Applied Mathematics and Computer Science, The Weizmann Institute of Science. y Department of Applied M...
Efficient Checking of Polynomials and Proofs and the Hardness of Approximation Problems
, 1992
"... The definition of the class NP [Coo71, Lev73] highlights the problem of verification of proofs as one of central interest to theoretical computer science. Recent efforts have shown that the efficiency of the verification can be greatly improved by allowing the verifier access to random bits and acce ..."
Abstract

Cited by 67 (9 self)
 Add to MetaCart
The definition of the class NP [Coo71, Lev73] highlights the problem of verification of proofs as one of central interest to theoretical computer science. Recent efforts have shown that the efficiency of the verification can be greatly improved by allowing the verifier access to random bits and accepting probabilistic guarantees from the verifier [BFL91, BFLS91, FGL + 91, AS92]. We improve upon the efficiency of the proof systems developed above and obtain proofs which can be verified probabilistically by examining only a constant number of (randomly chosen) bits of the proof. The efficiently verifiable proofs constructed here rely on the structural properties of lowdegree polynomials. We explore the properties of these functions by examining some simple and basic questions about them. We consider questions of the form: • (testing) Given an oracle for a function f, is f close to a lowdegree polynomial? • (correcting) Let f be close to a lowdegree polynomial g, is it possible to efficiently reconstruct the value of g on any given input using an oracle for f? 2 The questions described above have been raised before in the context of coding theory as the problems of errordetecting and errorcorrecting of codes. More recently
Many birds with one stone: Multiobjective approximation algorithms
, 1992
"... We study networkdesign problems with multiple design objectives. In particular, we look at two cost measures to be minimized simultaneously: the total cost of the network and the maximum degree of any node in the network. Our main result can be roughly stated as follows: given an integer b, we p ..."
Abstract

Cited by 62 (14 self)
 Add to MetaCart
We study networkdesign problems with multiple design objectives. In particular, we look at two cost measures to be minimized simultaneously: the total cost of the network and the maximum degree of any node in the network. Our main result can be roughly stated as follows: given an integer b, we present approximation algorithms for a variety of networkdesign problems on an n node graph in which the degree of the output network is O(b log( n b )) and the cost of this network is O(log n) times that of the minimumcost degreebbounded network. These algorithms can handle costs on nodes as well as edges. Moreover, we can construct such networks so as to satisfy a variety of connectivity specifications including spanning trees, Steiner trees and generalized Steiner forests. The performance guarantee on the cost of the output network is nearly bestpossible unless NP = ~ P . We also address the special case in which the costs obey the triangle inequality. In this case, we obtai...
Centrality estimation in large networks
 INTL. JOURNAL OF BIFURCATION AND CHAOS, SPECIAL ISSUE ON COMPLEX NETWORKS’ STRUCTURE AND DYNAMICS
, 2007
"... Centrality indices are an essential concept in network analysis. For those based on shortestpath distances the computation is at least quadratic in the number of nodes, since it usually involves solving the singlesource shortestpaths (SSSP) problem from every node. Therefore, exact computation is ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
(Show Context)
Centrality indices are an essential concept in network analysis. For those based on shortestpath distances the computation is at least quadratic in the number of nodes, since it usually involves solving the singlesource shortestpaths (SSSP) problem from every node. Therefore, exact computation is infeasible for many large networks of interest today. Centrality scores can be estimated, however, from a limited number of SSSP computations. We present results from an experimental study of the quality of such estimates under various selection strategies for the source vertices.