Results 1  10
of
83
Approximate distance oracles
 J. ACM
"... Let G = (V, E) be an undirected weighted graph with V  = n and E  = m. Let k ≥ 1 be an integer. We show that G = (V, E) can be preprocessed in O(kmn 1/k) expected time, constructing a data structure of size O(kn 1+1/k), such that any subsequent distance query can be answered, approximately, in ..."
Abstract

Cited by 218 (11 self)
 Add to MetaCart
Let G = (V, E) be an undirected weighted graph with V  = n and E  = m. Let k ≥ 1 be an integer. We show that G = (V, E) can be preprocessed in O(kmn 1/k) expected time, constructing a data structure of size O(kn 1+1/k), such that any subsequent distance query can be answered, approximately, in O(k) time. The approximate distance returned is of stretch at most 2k − 1, i.e., the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2k−1. A 1963 girth conjecture of Erdős, implies that Ω(n 1+1/k) space is needed in the worst case for any real stretch strictly smaller than 2k + 1. The space requirement of our algorithm is, therefore, essentially optimal. The most impressive feature of our data structure is its constant query time, hence the name “oracle”. Previously, data structures that used only O(n 1+1/k) space had a query time of Ω(n 1/k). Our algorithms are extremely simple and easy to implement efficiently. They also provide faster constructions of sparse spanners of weighted graphs, and improved tree covers and distance labelings of weighted or unweighted graphs. 1
Improved Combinatorial Algorithms for the Facility Location and kMedian Problems
 In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science
, 1999
"... We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 ..."
Abstract

Cited by 209 (13 self)
 Add to MetaCart
(Show Context)
We present improved combinatorial approximation algorithms for the uncapacitated facility location and kmedian problems. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of 2:414 + in ~ O(n 2 =) time. This also yields a bicriteria approximation tradeoff of (1 +; 1+ 2=) for facility cost versus service cost which is better than previously known tradeoffs and close to the best possible. Combining greedy improvement and cost scaling with a recent primal dual algorithm for facility location due to Jain and Vazirani, we get an approximation ratio of 1.853 in ~ O(n 3 ) time. This is already very close to the approximation guarantee of the best known algorithm which is LPbased. Further, combined with the best known LPbased algorithm for facility location, we get a very slight improvement in the approximation factor for facility location, achieving 1.728....
Clustering data streams: Theory and practice
 IEEE TKDE
, 2003
"... Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little ..."
Abstract

Cited by 111 (3 self)
 Add to MetaCart
(Show Context)
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams. Index Terms—Clustering, data streams, approximation algorithms. 1
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 101 (12 self)
 Add to MetaCart
(Show Context)
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Better Streaming Algorithms for Clustering Problems
, 2003
"... We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximatio ..."
Abstract

Cited by 77 (1 self)
 Add to MetaCart
We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k–Median problem which produces a constant factor approximation in one pass using storage space O(kpolylog n). This is a significant improvement of the previous best algorithm which yielded a 2 O(1/ɛ) approximation using O(n ɛ)space. Next we give a streaming algorithm for the k–Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.
Maintaining Variance and kMedians over Data Stream Windows
 In PODS
, 2003
"... The sliding window model is useful for discounting stale data in data stream applications. In this model, data elements arrive continually and only the most recent N elements are used when answering queries. We present a novel technique for solving two important and related problems in the sliding w ..."
Abstract

Cited by 76 (1 self)
 Add to MetaCart
(Show Context)
The sliding window model is useful for discounting stale data in data stream applications. In this model, data elements arrive continually and only the most recent N elements are used when answering queries. We present a novel technique for solving two important and related problems in the sliding window model  maintaining variance and maintaining a k median clustering. Our solution to the problem of maintaining variance provides a continually updated estimate of the variance of the last N values in a data stream with relative error of at most # using O( # 2 log N) memory. We present a constantfactor approximation algorithm which maintains an approximate kmedian solution for the last N data points using O( N) memory, where # < 1/2 is a parameter which trades o# the space bound with the approximation factor of O(2 ).
The Online Median Problem
 In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science
, 2000
"... We introduce a natural variant of the (metric uncapacitated) kmedian problem that we call the online median problem. Whereas the kmedian problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities ar ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
(Show Context)
We introduce a natural variant of the (metric uncapacitated) kmedian problem that we call the online median problem. Whereas the kmedian problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities are placed one at a time; a facility cannot be moved once it is placed, and the total number of facilities to be placed, k, is not known in advance. The objective of an online median algorithm is to minimize the competitive ratio, that is, the worstcase ratio of the cost of an online placement to that of an optimal offline placement. Our main result is a lineartime constantcompetitive algorithm for the online median problem. In addition, we present a related, though substantially simpler, lineartime constantfactor approximation algorithm for the (metric uncapacitated) facility location problem. The latter algorithm is similar in spirit to the recent primaldualbased facility location algorithm of Jain and Vazirani, but our approach is more elementary and yields an improved running time.
Coresets for kMeans and kMedian Clustering and their Applications
 In Proc. 36th Annu. ACM Sympos. Theory Comput
, 2003
"... In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmed ..."
Abstract

Cited by 48 (13 self)
 Add to MetaCart
In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmedian/means clustering on S instead of on P , and get an (1 + ")approximation.
A Sublinear Time Approximation Scheme for Clustering in Metric Spaces
 in Metric Spaces, Proc. 40th IEEE FOCS
"... The metric 2clustering problem is defined as follows: given a metric (X; d), partition X into two sets S 1 and S 2 in order to minimize the value of X i X fu;vgaeS i d(u; v) In this paper we show an approximation scheme for this problem. 1 Introduction In this paper we consider the following ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
The metric 2clustering problem is defined as follows: given a metric (X; d), partition X into two sets S 1 and S 2 in order to minimize the value of X i X fu;vgaeS i d(u; v) In this paper we show an approximation scheme for this problem. 1 Introduction In this paper we consider the following k clustering problem: given a weighted graph G = (X; d) on N vertices, where d(\Delta; \Delta) is a weight function, partition X into k sets S 1 : : : S k such that the value of X i X fu;vgaeS i d(u; v) is minimized. This problem was first formally posed by Sahni and Gonzalez [7]. They observed that the problem is NPcomplete (for k 2) and by reduction from kcoloring showed that it cannot be approximated up to any constant (for k 3). Instead, they proposed a 1=kapproximation algorithm for the dual version of this problem where the goal is to maximize the weight of edges which do not belong to any cluster (i.e. kmax cut). Unfortunately, the latter result does not (and cannot) impl...