Results 1 - 10
of
270
Correlation Clustering
- MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract
-
Cited by 158 (4 self)
- Add to MetaCart
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
Polynomial Time Approximation Schemes for Dense Instances of NP-Hard Problems
, 1995
"... We present a unified framework for designing polynomial time approximation schemes (PTASs) for "dense" instances of many NP-hard optimization problems, including maximum cut, graph bisection, graph separation, minimum k-way cut with and without specified terminals, and maximum 3-satisfiability. By d ..."
Abstract
-
Cited by 153 (25 self)
- Add to MetaCart
We present a unified framework for designing polynomial time approximation schemes (PTASs) for "dense" instances of many NP-hard optimization problems, including maximum cut, graph bisection, graph separation, minimum k-way cut with and without specified terminals, and maximum 3-satisfiability. By dense graphs we mean graphs with minimum degree Ω(n), although our algorithms solve most of these problems so long as the average degree is Ω(n). Denseness for non-graph problems is defined similarly. The unified framework begins with the idea of exhaustive sampling: picking a small random set of vertices, guessing where they go on the optimum solution, and then using their placement to determine the placement of everything else. The approach then develops into a PTAS for approximating certain smooth integer programs where the objective function and the constraints are "dense" polynomials of constant degree.
Efficient Testing of Large Graphs
- Combinatorica
"... Let P be a property of graphs. An -test for P is a randomized algorithm which, given the ability to make queries whether a desired pair of vertices of an input graph G with n vertices are adjacent or not, distinguishes, with high probability, between the case of G satisfying P and the case that it h ..."
Abstract
-
Cited by 141 (40 self)
- Add to MetaCart
Let P be a property of graphs. An -test for P is a randomized algorithm which, given the ability to make queries whether a desired pair of vertices of an input graph G with n vertices are adjacent or not, distinguishes, with high probability, between the case of G satisfying P and the case that it has to be modified by adding and removing more than n 2 edges to make it satisfy P . The property P is called testable, if for every there exists an -test for P whose total number of queries is independent of the size of the input graph. Goldreich, Goldwasser and Ron [8] showed that certain graph properties admit an -test. In this paper we make a first step towards a logical characterization of all testable graph properties, and show that properties describable by a very general type of coloring problem are testable. We use this theorem to prove that first order graph properties not containing a quantifier alternation of type "89" are always testable, while we show that some properties containing this alternation are not. Our results are proven using a combinatorial lemma, a special case of which, that may be of independent interest, is the following. A graph H is called -unavoidable in G if all graphs that differ from G in no more than jGj 2 places contain an induced copy of H . A graph H is called -abundant in G if G contains at least jGj jHj induced copies of H. If H is -unavoidable in G then it is also ( ; jHj)-abundant.
The art of uninformed decisions: A primer to property testing
- Science
, 2001
"... Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size. ..."
Abstract
-
Cited by 108 (17 self)
- Add to MetaCart
Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size.
Property Testing in Bounded Degree Graphs
- Algorithmica
, 1997
"... We further develop the study of testing graph properties as initiated by Goldreich, Goldwasser and Ron. Whereas they view graphs as represented by their adjacency matrix and measure distance between graphs as a fraction of all possible vertex pairs, we view graphs as represented by bounded-length in ..."
Abstract
-
Cited by 107 (32 self)
- Add to MetaCart
We further develop the study of testing graph properties as initiated by Goldreich, Goldwasser and Ron. Whereas they view graphs as represented by their adjacency matrix and measure distance between graphs as a fraction of all possible vertex pairs, we view graphs as represented by bounded-length incidence lists and measure distance between graphs as a fraction of the maximum possible number of edges. Thus, while the previous model is most appropriate for the study of dense graphs, our model is most appropriate for the study of bounded-degree graphs. In particular, we present randomized algorithms for testing whether an unknown boundeddegree graph is connected, k-connected (for k ? 1), planar, etc. Our algorithms work in time polynomial in 1=ffl, always accept the graph when it has the tested property, and reject with high probability if the graph is ffl-away from having the property. For example, the 2-Connectivity algorithm rejects (w.h.p.) any N-vertex d-degree graph for which more ...
Quick Approximation to Matrices and Applications
"... We give algorithms to find the following simply described approximation to a given matrix. Given an m \Theta n matrix A with entries between say-1 and 1, and an error parameter ffl between 0 and 1, we find a matrix D (implicitly) which is the sum of O(1=ffl 2 ) simple rank 1 matrices so that the ..."
Abstract
-
Cited by 96 (3 self)
- Add to MetaCart
We give algorithms to find the following simply described approximation to a given matrix. Given an m \Theta n matrix A with entries between say-1 and 1, and an error parameter ffl between 0 and 1, we find a matrix D (implicitly) which is the sum of O(1=ffl 2 ) simple rank 1 matrices so that the sum of entries of any submatrix (among the 2 m+n ) of (A \Gamma D) is at most fflmn in absolute value. Our algorithm takes time dependent only on ffl and the allowed probability of failure (not on m;n). We draw on two lines of research to develop the algorithms: one is built around the fundamental Regularity Lemma of Szemer'edi in Graph Theory and the constructive version of Alon, Duke, Leffman, Rodl and Yuster. The second one is from the papers of Arora, Karger and Karpinski, Fernandez de la Vega and most directly Goldwasser, Goldreich and Ron who develop approximation algorithms for a set of graph problems, typical of which is the maximum cut problem. ?From our matrix approximation, the...
Using Output Codes to Boost Multiclass Learning Problems
- MACHINE LEARNING: PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE, 1997 (ICML-97)
, 1997
"... This paper describes a new technique for solving multiclass learning problems by combining Freund and Schapire's boosting algorithm with the main ideas of Dietterich and Bakiri's method of error-correcting output codes (ECOC). Boosting is a general method of improving the accuracy of a given base or ..."
Abstract
-
Cited by 78 (9 self)
- Add to MetaCart
This paper describes a new technique for solving multiclass learning problems by combining Freund and Schapire's boosting algorithm with the main ideas of Dietterich and Bakiri's method of error-correcting output codes (ECOC). Boosting is a general method of improving the accuracy of a given base or "weak" learning algorithm. ECOC is a robust method of solving multiclass learning problems by reducing to a sequence of two-class problems. We show that our new hybrid method has advantages of both: Like ECOC, our method only requires that the base learning algorithm work on binary-labeled data. Like boosting, we prove that the method comes with strong theoretical guarantees on the training and generalization error of the final combined hypothesis assuming only that the base learning algorithm perform slightly better than random guessing. Although previous methods were known for boosting multiclass problems, the new method may be significantly faster and require less programming effort in creating the base
learning algorithm. We also compare the new algorithm
experimentally to other voting methods.
A characterization of the (natural) graph properties testable with one-sided error
- Proc. of FOCS 2005
, 2005
"... The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of property-testing. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decis ..."
Abstract
-
Cited by 77 (14 self)
- Add to MetaCart
The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of property-testing. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decisions are independent of the size of the input graph. We show that a graph property P has an oblivious one-sided error tester, if and only if P is (almost) hereditary. We stress that any ”natural ” property that can be tested (either with one-sided or with two-sided error) can be tested by an oblivious tester. In particular, all the testers studied thus far in the literature were oblivious. Our main result can thus be considered as a precise characterization of the ”natural” graph properties, which are testable with one-sided error. One of the main technical contributions of this paper is in showing that any hereditary graph property can be tested with one-sided error. This general result contains as a special case all the previous results about testing graph properties with one-sided error. These include the results of [20] and [5] about testing k-colorability, the characterization of [21] of the graph-partitioning problems that are testable with one-sided error, the induced vertex colorability properties of [3], the induced edge colorability properties of [14], a transformation from two-sided to one-sided error testing [21], as well as a recent result about testing monotone graph properties [10]. More importantly, as a special case of our main result, we infer that some of the most well studied graph properties, both in graph theory and computer science, are testable with one-sided error. Some of these properties are the well known graph properties of being Perfect, Chordal, Interval, Comparability, Permutation and more. None of these properties was previously known to be testable. 1
Regular Languages are Testable with a Constant Number of Queries
- SIAM Journal on Computing
, 1999
"... We continue the study of combinatorial property testing, initiated by Goldreich, Goldwasser and Ron in [7]. The subject of this paper is testing regular languages. Our main result is as follows. For a regular language L 2 f0; 1g and an integer n there exists a randomized algorithm which always acc ..."
Abstract
-
Cited by 74 (19 self)
- Add to MetaCart
We continue the study of combinatorial property testing, initiated by Goldreich, Goldwasser and Ron in [7]. The subject of this paper is testing regular languages. Our main result is as follows. For a regular language L 2 f0; 1g and an integer n there exists a randomized algorithm which always accepts a word w of length n if w 2 L, and rejects it with high probability if w has to be modified in at least n positions to create a word in L. The algorithm queries ~ O(1=) bits of w. This query complexity is shown to be optimal up to a factor poly-logarithmic in 1=. We also discuss testability of more complex languages and show, in particular, that the query complexity required for testing contextfree languages cannot be bounded by any function of . The problem of testing regular languages can be viewed as a part of a very general approach, seeking to probe testability of properties defined by logical means. 1
Property Testing
- Handbook of Randomized Computing, Vol. II
, 2000
"... this technical aspect (as in the bounded-degree model the closest graph having the property must have at most dN edges and degree bound d as well). ..."
Abstract
-
Cited by 71 (10 self)
- Add to MetaCart
this technical aspect (as in the bounded-degree model the closest graph having the property must have at most dN edges and degree bound d as well).

