Results 11 - 20
of
270
Sublinear Time Algorithms for Metric Space Problems
"... In this paper we give approximation algorithms for the following problems on metric spaces: Furthest Pair, k- median, Minimum Routing Cost Spanning Tree, Multiple Sequence Alignment, Maximum Traveling Salesman Problem, Maximum Spanning Tree and Average Distance. The key property of our algorithms i ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
In this paper we give approximation algorithms for the following problems on metric spaces: Furthest Pair, k- median, Minimum Routing Cost Spanning Tree, Multiple Sequence Alignment, Maximum Traveling Salesman Problem, Maximum Spanning Tree and Average Distance. The key property of our algorithms is that their running time is linear in the number of metric space points. As the full specification o`f an n-point metric space is of size \Theta(n 2 ), the complexity of our algorithms is sublinear with respect to the input size. All previous algorithms (exact or approximate) for the problems we consider have running time\Omega\Gamma n 2 ). We believe that our techniques can be applied to get similar bounds for other problems. 1 Introduction In recent years there has been a dramatic growth of interest in algorithms operating on massive data sets. This poses new challenges for algorithm design, as algorithms quite efficient on small inputs (for example, having quadratic running time) ...
Robust PCPs of Proximity, Shorter PCPs and Applications to Coding
- in Proc. 36th ACM Symp. on Theory of Computing
, 2004
"... We continue the study of the trade-o between the length of PCPs and their query complexity, establishing the following main results (which refer to proofs of satis ability of circuits of size n): 1. We present PCPs of length exp( ~ O(log log n) ) n that can be veri ed by making o(log log n) ..."
Abstract
-
Cited by 68 (22 self)
- Add to MetaCart
We continue the study of the trade-o between the length of PCPs and their query complexity, establishing the following main results (which refer to proofs of satis ability of circuits of size n): 1. We present PCPs of length exp( ~ O(log log n) ) n that can be veri ed by making o(log log n) Boolean queries.
A New Rounding Procedure for the Assignment Problem with Applications to Dense Graph Arrangement Problems
, 2001
"... We present a randomized procedure for rounding fractional perfect matchings to (integral) matchings. If the original fractional matching satis es any linear inequality, then with high probability, the new matching satis es that linear inequality in an approximate sense. This extends the well-kn ..."
Abstract
-
Cited by 64 (3 self)
- Add to MetaCart
We present a randomized procedure for rounding fractional perfect matchings to (integral) matchings. If the original fractional matching satis es any linear inequality, then with high probability, the new matching satis es that linear inequality in an approximate sense. This extends the well-known LP rounding procedure of Raghavan and Thompson, which is usually used to round fractional solutions of linear programs.
Three Theorems regarding Testing Graph Properties
, 2002
"... Property testing is a relaxation of decision problems in which it is required to distinguish yes-instances (i.e., objects having a predetermined property) from instances that are far from any yes-instance. We presents three theorems regarding testing graph properties in the adjacency matrix represen ..."
Abstract
-
Cited by 64 (8 self)
- Add to MetaCart
Property testing is a relaxation of decision problems in which it is required to distinguish yes-instances (i.e., objects having a predetermined property) from instances that are far from any yes-instance. We presents three theorems regarding testing graph properties in the adjacency matrix representation. More specifically, these theorems relate to the project of characterizing graph properties according to the complexity of testing them (in the adjacency matrix representation). The first
Testing that distributions are close
- In IEEE Symposium on Foundations of Computer Science
, 2000
"... Given two distributions over an n element set, we wish to check whether these distributions are statistically close by only sampling. We give a sublinear algorithm which uses O(n 2/3 ɛ −4 log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions ..."
Abstract
-
Cited by 59 (12 self)
- Add to MetaCart
Given two distributions over an n element set, we wish to check whether these distributions are statistically close by only sampling. We give a sublinear algorithm which uses O(n 2/3 ɛ −4 log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases ɛ when the distance between the distributions is small (less than max ( 2 32 3 √ n, ɛ 4 √)) or large (more n than ɛ) in L1-distance. We also give an Ω(n 2/3 ɛ −2/3) lower bound. Our algorithm has applications to the problem of checking whether a given Markov process is rapidly mixing. We develop sublinear algorithms for this problem as well.
Locally Testable Codes and PCPs of Almost-Linear Length
, 2002
"... Locally testable codes are error-correcting codes that admit very efficient codeword tests. Specifically, using ..."
Abstract
-
Cited by 55 (17 self)
- Add to MetaCart
Locally testable codes are error-correcting codes that admit very efficient codeword tests. Specifically, using
Clustering Large Graphs via the Singular Value Decomposition
- MACHINE LEARNING
, 2004
"... We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the ..."
Abstract
-
Cited by 53 (1 self)
- Add to MetaCart
We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right. Finally, we
Testing of Clustering
- In Proc. 41th Annu. IEEE Sympos. Found. Comput. Sci
, 2000
"... A set X of points in ! d is (k; b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)-clusterable and the ca ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
A set X of points in ! d is (k; b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)-clusterable and the case that X is ffl-far from being (k; b 0 )-clusterable for any given 0 ! ffl 1 and for b 0 b. In ffl-far from being (k; b 0 )-clusterable we mean that more than ffl \Delta jX j points should be removed from X so that it becomes (k; b 0 )-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of jX j, and polynomial in k and 1=ffl. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an ffl-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independ...
A combinatorial characterization of the testable graph properties: it’s all about regularity
- Proc. of STOC 2006
, 2006
"... A common thread in all the recent results concerning testing dense graphs is the use of Szemerédi’s regularity lemma. In this paper we show that in some sense this is not a coincidence. Our first result is that the property defined by having any given Szemerédi-partition is testable with a constant ..."
Abstract
-
Cited by 51 (10 self)
- Add to MetaCart
A common thread in all the recent results concerning testing dense graphs is the use of Szemerédi’s regularity lemma. In this paper we show that in some sense this is not a coincidence. Our first result is that the property defined by having any given Szemerédi-partition is testable with a constant number of queries. Our second and main result is a purely combinatorial characterization of the graph properties that are testable with a constant number of queries. This characterization (roughly) says that a graph property P can be tested with a constant number of queries if and only if testing P can be reduced to testing the property of satisfying one of finitely many Szemerédi-partitions. This means that in some sense, testing for Szemerédi-partitions is as hard as testing any testable graph property. We thus resolve one of the main open problems in the area of property-testing, which was first raised in the 1996 paper of Goldreich, Goldwasser and Ron [24] that initiated the study of graph property-testing. This characterization also gives an intuitive explanation as to what makes a graph property testable.
Some 3CNF properties are hard to test
- In Proc. 35th ACM Symp. on Theory of Computing
, 2003
"... Abstract. For a Boolean formula ϕ on n variables, the associated property Pϕ is the collection of n-bit strings that satisfy ϕ. We study the query complexity of tests that distinguish (with high probability) between strings in Pϕ and strings that are far from Pϕ in Hamming distance. We prove that th ..."
Abstract
-
Cited by 48 (10 self)
- Add to MetaCart
Abstract. For a Boolean formula ϕ on n variables, the associated property Pϕ is the collection of n-bit strings that satisfy ϕ. We study the query complexity of tests that distinguish (with high probability) between strings in Pϕ and strings that are far from Pϕ in Hamming distance. We prove that there are 3CNF formulae (with O(n) clauses) such that testing for the associated property requires Ω(n) queries, even with adaptive tests. This contrasts with 2CNF formulae, whose associated properties are always testable with O ( √ n) queries [E. Fischer et al., Monotonicity testing over general poset domains, in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, ACM, New York, 2002, pp. 474–483]. Notice that for every negative instance (i.e., an assignment that does not satisfy ϕ) there are three bit queries that witness this fact. Nevertheless, finding such a short witness requires reading a constant fraction of the input, even when the input is very far from satisfying the formula that is associated with the property. A property is linear if its elements form a linear space. We provide sufficient conditions for linear properties to be hard to test, and in the course of the proof include the following observations which are of independent interest: 1. In the context of testing for linear properties, adaptive two-sided error tests have no more power than nonadaptive one-sided error tests. Moreover, without loss of generality, any test for a linear property is a linear test. A linear test verifies that a portion of the input satisfies a set of linear constraints, which define the property, and rejects if and only if it finds a falsified constraint. A linear test is by definition nonadaptive and, when applied to linear properties, has a one-sided error. 2. Random low density parity check codes (which are known to have linear distance and constant rate) are not locally testable. In fact, testing such a code of length n requires Ω(n) queries.

