Results 1  10
of
163
Correlation Clustering
 MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract

Cited by 223 (4 self)
 Add to MetaCart
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
The art of uninformed decisions: A primer to property testing
 Science
, 2001
"... Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size. ..."
Abstract

Cited by 131 (21 self)
 Add to MetaCart
Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size.
A characterization of the (natural) graph properties testable with onesided error
 Proc. of FOCS 2005
, 2005
"... The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of propertytesting. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decis ..."
Abstract

Cited by 89 (17 self)
 Add to MetaCart
The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of propertytesting. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decisions are independent of the size of the input graph. We show that a graph property P has an oblivious onesided error tester, if and only if P is (almost) hereditary. We stress that any ”natural ” property that can be tested (either with onesided or with twosided error) can be tested by an oblivious tester. In particular, all the testers studied thus far in the literature were oblivious. Our main result can thus be considered as a precise characterization of the ”natural” graph properties, which are testable with onesided error. One of the main technical contributions of this paper is in showing that any hereditary graph property can be tested with onesided error. This general result contains as a special case all the previous results about testing graph properties with onesided error. These include the results of [20] and [5] about testing kcolorability, the characterization of [21] of the graphpartitioning problems that are testable with onesided error, the induced vertex colorability properties of [3], the induced edge colorability properties of [14], a transformation from twosided to onesided error testing [21], as well as a recent result about testing monotone graph properties [10]. More importantly, as a special case of our main result, we infer that some of the most well studied graph properties, both in graph theory and computer science, are testable with onesided error. Some of these properties are the well known graph properties of being Perfect, Chordal, Interval, Comparability, Permutation and more. None of these properties was previously known to be testable. 1
Regular Languages are Testable with a Constant Number of Queries
 SIAM Journal on Computing
, 1999
"... We continue the study of combinatorial property testing, initiated by Goldreich, Goldwasser and Ron in [7]. The subject of this paper is testing regular languages. Our main result is as follows. For a regular language L 2 f0; 1g and an integer n there exists a randomized algorithm which always acc ..."
Abstract

Cited by 80 (20 self)
 Add to MetaCart
We continue the study of combinatorial property testing, initiated by Goldreich, Goldwasser and Ron in [7]. The subject of this paper is testing regular languages. Our main result is as follows. For a regular language L 2 f0; 1g and an integer n there exists a randomized algorithm which always accepts a word w of length n if w 2 L, and rejects it with high probability if w has to be modified in at least n positions to create a word in L. The algorithm queries ~ O(1=) bits of w. This query complexity is shown to be optimal up to a factor polylogarithmic in 1=. We also discuss testability of more complex languages and show, in particular, that the query complexity required for testing contextfree languages cannot be bounded by any function of . The problem of testing regular languages can be viewed as a part of a very general approach, seeking to probe testability of properties defined by logical means. 1
Testing that distributions are close
 In IEEE Symposium on Foundations of Computer Science
, 2000
"... Given two distributions over an n element set, we wish to check whether these distributions are statistically close by only sampling. We give a sublinear algorithm which uses O(n 2/3 ɛ −4 log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
Given two distributions over an n element set, we wish to check whether these distributions are statistically close by only sampling. We give a sublinear algorithm which uses O(n 2/3 ɛ −4 log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases ɛ when the distance between the distributions is small (less than max ( 2 32 3 √ n, ɛ 4 √)) or large (more n than ɛ) in L1distance. We also give an Ω(n 2/3 ɛ −2/3) lower bound. Our algorithm has applications to the problem of checking whether a given Markov process is rapidly mixing. We develop sublinear algorithms for this problem as well.
Property Testing
 Handbook of Randomized Computing, Vol. II
, 2000
"... this technical aspect (as in the boundeddegree model the closest graph having the property must have at most dN edges and degree bound d as well). ..."
Abstract

Cited by 74 (11 self)
 Add to MetaCart
this technical aspect (as in the boundeddegree model the closest graph having the property must have at most dN edges and degree bound d as well).
Three Theorems regarding Testing Graph Properties
, 2001
"... Property testing is a relaxation of decision problems in which it is required to distinguish yesinstances (i.e., objects having a predetermined property) from instances that are far from any yesinstance. We presents three theorems regarding testing graph properties in the adjacency matrix represe ..."
Abstract

Cited by 73 (10 self)
 Add to MetaCart
Property testing is a relaxation of decision problems in which it is required to distinguish yesinstances (i.e., objects having a predetermined property) from instances that are far from any yesinstance. We presents three theorems regarding testing graph properties in the adjacency matrix representation. More specifically, these theorems relate to the project of characterizing graph properties according to the complexity of testing them (in the adjacency matrix representation). The first theorem is that there exist monotone graph properties in N P for which testing is very hard (i.e., requires to examine a constant fraction of the entries in the matrix). The second theorem is that every graph property that can be tested making a number of queries that is independent of the size of the graph, can be so tested by uniformly selecting a set of vertices and accepting iff the induced subgraph has some fixed graph property (which is not necessarily the same as the one being tested). The third theorem refers to the framework of graph partition problems, and is a characterization of the subclass of properties that can be tested using a onesided error tester making a number of queries that is independent of the size of the graph.
A combinatorial characterization of the testable graph properties: it’s all about regularity
 Proc. of STOC 2006
, 2006
"... A common thread in all the recent results concerning testing dense graphs is the use of Szemerédi’s regularity lemma. In this paper we show that in some sense this is not a coincidence. Our first result is that the property defined by having any given Szemerédipartition is testable with a constant ..."
Abstract

Cited by 66 (14 self)
 Add to MetaCart
A common thread in all the recent results concerning testing dense graphs is the use of Szemerédi’s regularity lemma. In this paper we show that in some sense this is not a coincidence. Our first result is that the property defined by having any given Szemerédipartition is testable with a constant number of queries. Our second and main result is a purely combinatorial characterization of the graph properties that are testable with a constant number of queries. This characterization (roughly) says that a graph property P can be tested with a constant number of queries if and only if testing P can be reduced to testing the property of satisfying one of finitely many Szemerédipartitions. This means that in some sense, testing for Szemerédipartitions is as hard as testing any testable graph property. We thus resolve one of the main open problems in the area of propertytesting, which was first raised in the 1996 paper of Goldreich, Goldwasser and Ron [24] that initiated the study of graph propertytesting. This characterization also gives an intuitive explanation as to what makes a graph property testable.
Some 3CNF properties are hard to test
 In Proc. 35th ACM Symp. on Theory of Computing
, 2003
"... Abstract. For a Boolean formula ϕ on n variables, the associated property Pϕ is the collection of nbit strings that satisfy ϕ. We study the query complexity of tests that distinguish (with high probability) between strings in Pϕ and strings that are far from Pϕ in Hamming distance. We prove that th ..."
Abstract

Cited by 61 (10 self)
 Add to MetaCart
Abstract. For a Boolean formula ϕ on n variables, the associated property Pϕ is the collection of nbit strings that satisfy ϕ. We study the query complexity of tests that distinguish (with high probability) between strings in Pϕ and strings that are far from Pϕ in Hamming distance. We prove that there are 3CNF formulae (with O(n) clauses) such that testing for the associated property requires Ω(n) queries, even with adaptive tests. This contrasts with 2CNF formulae, whose associated properties are always testable with O ( √ n) queries [E. Fischer et al., Monotonicity testing over general poset domains, in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, ACM, New York, 2002, pp. 474–483]. Notice that for every negative instance (i.e., an assignment that does not satisfy ϕ) there are three bit queries that witness this fact. Nevertheless, finding such a short witness requires reading a constant fraction of the input, even when the input is very far from satisfying the formula that is associated with the property. A property is linear if its elements form a linear space. We provide sufficient conditions for linear properties to be hard to test, and in the course of the proof include the following observations which are of independent interest: 1. In the context of testing for linear properties, adaptive twosided error tests have no more power than nonadaptive onesided error tests. Moreover, without loss of generality, any test for a linear property is a linear test. A linear test verifies that a portion of the input satisfies a set of linear constraints, which define the property, and rejects if and only if it finds a falsified constraint. A linear test is by definition nonadaptive and, when applied to linear properties, has a onesided error. 2. Random low density parity check codes (which are known to have linear distance and constant rate) are not locally testable. In fact, testing such a code of length n requires Ω(n) queries.
Testing of Clustering
 In Proc. 41th Annu. IEEE Sympos. Found. Comput. Sci
, 2000
"... A set X of points in ! d is (k; b)clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)clusterable and the ca ..."
Abstract

Cited by 58 (14 self)
 Add to MetaCart
A set X of points in ! d is (k; b)clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)clusterable and the case that X is fflfar from being (k; b 0 )clusterable for any given 0 ! ffl 1 and for b 0 b. In fflfar from being (k; b 0 )clusterable we mean that more than ffl \Delta jX j points should be removed from X so that it becomes (k; b 0 )clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of jX j, and polynomial in k and 1=ffl. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an fflfraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independ...