Results 1  10
of
48
A Tight Bound on Approximating Arbitrary Metrics by Tree Metrics
 In Proceedings of the 35th Annual ACM Symposium on Theory of Computing
, 2003
"... In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; t ..."
Abstract

Cited by 269 (7 self)
 Add to MetaCart
In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; there exist metric spaces where any tree embedding must have distortion#sto n)distortion. This problem lies at the heart of numerous approximation and online algorithms including ones for group Steiner tree, metric labeling, buyatbulk network design and metrical task system. Our result improves the performance guarantees for all of these problems.
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract

Cited by 153 (5 self)
 Add to MetaCart
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters. 1 Introduction We consider the following problem: as a sequence of points from a metric...
Approximate Clustering without the Approximation
"... Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for ..."
Abstract

Cited by 35 (18 self)
 Add to MetaCart
Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also give us more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown “correct” target clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close (in symmetric difference) to the truth. In this paper, we show that if we make this implicit assumption explicit—that is, if we assume that any capproximation to the given clustering objective F is ǫclose to the target—then we can produce clusterings that are O(ǫ)close to the target, even for values c for which obtaining a capproximation is NPhard. In particular, for kmedian and kmeans objectives, we show that we can achieve this guarantee for any constant c> 1, and for minsum objective we can do this for any constant c> 2. Our results also highlight a somewhat surprising conceptual difference between assuming that the optimal solution to, say, the kmedian objective is ǫclose to the target, and assuming that any approximately optimal solution is ǫclose to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ǫ)close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
Clustering to Minimize the Sum of Cluster Diameters
, 2001
"... We study the problem of clustering points in a metric space so as to minimize the sumof cluster diameters or the sum of cluster radii. Significantly improving on previous results, we present ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
We study the problem of clustering points in a metric space so as to minimize the sumof cluster diameters or the sum of cluster radii. Significantly improving on previous results, we present
Advances in metric embedding theory
 IN STOC ’06: PROCEEDINGS OF THE THIRTYEIGHTH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 2006
"... Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The theory of metric embedding received much attention in recent years by mathematicians ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The theory of metric embedding received much attention in recent years by mathematicians as well as computer scientists and has been applied in many algorithmic applications. A cornerstone of the field is a celebrated theorem of Bourgain which states that every finite metric space on n points embeds in Euclidean space with O(log n) distortion. Bourgain’s result is best possible when considering the worst case distortion over all pairs of points in the metric space. Yet, it is possible that an embedding can do much better in terms of the average distortion. Indeed, in most practical applications of metric embedding the main criteria for the quality of an embedding is its average distortion over all pairs. In this paper we provide an embedding with constant average distortion for arbitrary metric spaces, while maintaining the same worst case bound provided by Bourgain’s theorem. In fact, our embedding possesses a much stronger property. We define the ℓqdistortion of a uniformly distributed pair of points. Our embedding achieves the best possible ℓqdistortion for all 1 ≤ q ≤ ∞ simultaneously. These results have several algorithmic implications, e.g. an O(1) approximation for the unweighted uncapacitated quadratic assignment problem. The results are based on novel embedding methods which improve on previous methods in another important aspect: the dimension. The dimension of an embedding is of very high importance in particular in applications and much effort has been invested in analyzing it. However, no previous result im
Networkaware overlays with network coordinates
 In Proc. of International Workshop on Dynamic Distributed Systems
, 2006
"... Network coordinates, which embed network distance measurements in a coordinate system, were introduced as a method for determining the proximity of nodes for routing table updates in overlay networks. Their power has far broader reach: due to their low overhead and automatic adaptation to changes in ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
Network coordinates, which embed network distance measurements in a coordinate system, were introduced as a method for determining the proximity of nodes for routing table updates in overlay networks. Their power has far broader reach: due to their low overhead and automatic adaptation to changes in the network, network coordinates provide a new paradigm for managing dynamic overlay networks. We compare network coordinates to other proposals for networkaware overlays and show how they permit the lucid expression of a range of distributed systems problems in wellunderstood geometric terms. 1.
A Uniqueness Theorem for Clustering
"... Despite the widespread use of Clustering, there is distressingly little general theory of clustering available. Questions like “What distinguishes a clustering of data from other data partitioning?”, “Are there any principles governing all clustering paradigms?”, “How should a user choose an appropr ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Despite the widespread use of Clustering, there is distressingly little general theory of clustering available. Questions like “What distinguishes a clustering of data from other data partitioning?”, “Are there any principles governing all clustering paradigms?”, “How should a user choose an appropriate clustering algorithm for a particular task?”, etc. are almost completely unanswered by the existing body of clustering literature. We consider an axiomatic approach to the theory of Clustering. We adopt the framework of Kleinberg, [Kle03]. By relaxing one of Kleinberg’s clustering axioms, we sidestep his impossibility result and arrive at a consistent set of axioms. We suggest to extend these axioms, aiming to provide an axiomatic taxonomy of clustering paradigms. Such a taxonomy should provide users some guidance concerning the choice of the appropriate clustering paradigm for a given task. The main result of this paper is a set of abstract properties that characterize the SingleLinkage clustering function. This characterization result provides new insight into the properties of desired data groupings that make SingleLinkage the appropriate choice. We conclude by considering a taxonomy of clustering functions based on abstract properties that each satisfies. 1
A Ramseytype Theorem for Metric Spaces and its Applications for Metrical Task Systems and Related Problems
 In 42nd Annual IEEE Symposium on Foundations of Computer Science
, 2001
"... This paper gives a nearly logarithmic lower bound on the randomized competitive ratio for the Metrical Task Systems model [BLS92]. This implies a similar lower bound for the extensively studied Kserver problem. Our proof is based on proving a Ramseytype theorem for metric spaces. In particular we ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
This paper gives a nearly logarithmic lower bound on the randomized competitive ratio for the Metrical Task Systems model [BLS92]. This implies a similar lower bound for the extensively studied Kserver problem. Our proof is based on proving a Ramseytype theorem for metric spaces. In particular we prove that in every metric space there exists a large subspace which is approximately a "hierarchically wellseparated tree" (HST) [Bar96]. This theorem may be of independent interest.
A constant factor approximation algorithm for kmedian clustering with outliers. Available at http://www.uiuc.edu/˜kechen/outliers.pdf
, 2007
"... We consider the kmedian clustering with outliers problem: Given a finite point set in a metric space and parameters k and m, we want to remove m points (called outliers), such that the cost of the optimal kmedian clustering of the remaining points is minimized. We present the first polynomial time ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We consider the kmedian clustering with outliers problem: Given a finite point set in a metric space and parameters k and m, we want to remove m points (called outliers), such that the cost of the optimal kmedian clustering of the remaining points is minimized. We present the first polynomial time constant factor approximation algorithm for this problem. 1
Facility location with service installation costs
 In Proceedings of the 15th Annual ACMSIAM Symposium on Discrete Algorithms
, 2004
"... Our main result is a primaldual 6approximation algorithm under the assumption that there is an ordering on the facilities such that if i comes before i0 in this ordering then for every service type l, fli < = fli0. This includes (as special cases) the settings where the service installation cost f ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Our main result is a primaldual 6approximation algorithm under the assumption that there is an ordering on the facilities such that if i comes before i0 in this ordering then for every service type l, fli < = fli0. This includes (as special cases) the settings where the service installation cost fli depends only on the service type l, or depends only on the location i. With arbitrary service installation costs, the problem becomes as hard as the setcover problem. Our algorithm extends the algorithm of Jain & Vazirani [9] in a novel way. If the service installation cost depends only on the service type and not on the location, we give an LP rounding algorithm that attains an improved approximation ratio of 2.391. The algorithm combines both clustered randomized rounding [6] and the filtering based technique of [10, 14]. We also consider the kmedian version of the problem where there is an additional requirement that at most k facilities may be opened. We use our primaldual algorithm to give a constantfactor approximation for this problem when the service installation cost depends only on the service type. 1 Introduction Facility location problems have been widely studied inthe Operations Research community (see for e.g. [12]). In its simplest version, uncapacitated facility location(UFL), we are given a set of facilities, F, and a set of demands or clients D. Each facility i has a facilityopening cost