Results 1  10
of
323
Correlation Clustering
 MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract

Cited by 222 (4 self)
 Add to MetaCart
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
Polynomial Time Approximation Schemes for Dense Instances of NPHard Problems
, 1995
"... We present a unified framework for designing polynomial time approximation schemes (PTASs) for "dense" instances of many NPhard optimization problems, including maximum cut, graph bisection, graph separation, minimum kway cut with and without specified terminals, and maximum 3satisfiability. By d ..."
Abstract

Cited by 174 (28 self)
 Add to MetaCart
We present a unified framework for designing polynomial time approximation schemes (PTASs) for "dense" instances of many NPhard optimization problems, including maximum cut, graph bisection, graph separation, minimum kway cut with and without specified terminals, and maximum 3satisfiability. By dense graphs we mean graphs with minimum degree Ω(n), although our algorithms solve most of these problems so long as the average degree is Ω(n). Denseness for nongraph problems is defined similarly. The unified framework begins with the idea of exhaustive sampling: picking a small random set of vertices, guessing where they go on the optimum solution, and then using their placement to determine the placement of everything else. The approach then develops into a PTAS for approximating certain smooth integer programs where the objective function and the constraints are "dense" polynomials of constant degree.
Efficient Testing of Large Graphs
 Combinatorica
"... Let P be a property of graphs. An test for P is a randomized algorithm which, given the ability to make queries whether a desired pair of vertices of an input graph G with n vertices are adjacent or not, distinguishes, with high probability, between the case of G satisfying P and the case that it h ..."
Abstract

Cited by 157 (44 self)
 Add to MetaCart
Let P be a property of graphs. An test for P is a randomized algorithm which, given the ability to make queries whether a desired pair of vertices of an input graph G with n vertices are adjacent or not, distinguishes, with high probability, between the case of G satisfying P and the case that it has to be modified by adding and removing more than n 2 edges to make it satisfy P . The property P is called testable, if for every there exists an test for P whose total number of queries is independent of the size of the input graph. Goldreich, Goldwasser and Ron [8] showed that certain graph properties admit an test. In this paper we make a first step towards a logical characterization of all testable graph properties, and show that properties describable by a very general type of coloring problem are testable. We use this theorem to prove that first order graph properties not containing a quantifier alternation of type "89" are always testable, while we show that some properties containing this alternation are not. Our results are proven using a combinatorial lemma, a special case of which, that may be of independent interest, is the following. A graph H is called unavoidable in G if all graphs that differ from G in no more than jGj 2 places contain an induced copy of H . A graph H is called abundant in G if G contains at least jGj jHj induced copies of H. If H is unavoidable in G then it is also ( ; jHj)abundant.
The art of uninformed decisions: A primer to property testing
 Science
, 2001
"... Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size. ..."
Abstract

Cited by 128 (20 self)
 Add to MetaCart
Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size.
Property Testing in Bounded Degree Graphs
 Algorithmica
, 1997
"... We further develop the study of testing graph properties as initiated by Goldreich, Goldwasser and Ron. Whereas they view graphs as represented by their adjacency matrix and measure distance between graphs as a fraction of all possible vertex pairs, we view graphs as represented by boundedlength in ..."
Abstract

Cited by 119 (36 self)
 Add to MetaCart
We further develop the study of testing graph properties as initiated by Goldreich, Goldwasser and Ron. Whereas they view graphs as represented by their adjacency matrix and measure distance between graphs as a fraction of all possible vertex pairs, we view graphs as represented by boundedlength incidence lists and measure distance between graphs as a fraction of the maximum possible number of edges. Thus, while the previous model is most appropriate for the study of dense graphs, our model is most appropriate for the study of boundeddegree graphs. In particular, we present randomized algorithms for testing whether an unknown boundeddegree graph is connected, kconnected (for k ? 1), planar, etc. Our algorithms work in time polynomial in 1=ffl, always accept the graph when it has the tested property, and reject with high probability if the graph is fflaway from having the property. For example, the 2Connectivity algorithm rejects (w.h.p.) any Nvertex ddegree graph for which more ...
Quick Approximation to Matrices and Applications
"... We give algorithms to find the following simply described approximation to a given matrix. Given an m \Theta n matrix A with entries between say1 and 1, and an error parameter ffl between 0 and 1, we find a matrix D (implicitly) which is the sum of O(1=ffl 2 ) simple rank 1 matrices so that the ..."
Abstract

Cited by 114 (3 self)
 Add to MetaCart
We give algorithms to find the following simply described approximation to a given matrix. Given an m \Theta n matrix A with entries between say1 and 1, and an error parameter ffl between 0 and 1, we find a matrix D (implicitly) which is the sum of O(1=ffl 2 ) simple rank 1 matrices so that the sum of entries of any submatrix (among the 2 m+n ) of (A \Gamma D) is at most fflmn in absolute value. Our algorithm takes time dependent only on ffl and the allowed probability of failure (not on m;n). We draw on two lines of research to develop the algorithms: one is built around the fundamental Regularity Lemma of Szemer'edi in Graph Theory and the constructive version of Alon, Duke, Leffman, Rodl and Yuster. The second one is from the papers of Arora, Karger and Karpinski, Fernandez de la Vega and most directly Goldwasser, Goldreich and Ron who develop approximation algorithms for a set of graph problems, typical of which is the maximum cut problem. ?From our matrix approximation, the...
A characterization of the (natural) graph properties testable with onesided error
 Proc. of FOCS 2005
, 2005
"... The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of propertytesting. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decis ..."
Abstract

Cited by 91 (16 self)
 Add to MetaCart
The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of propertytesting. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decisions are independent of the size of the input graph. We show that a graph property P has an oblivious onesided error tester, if and only if P is (almost) hereditary. We stress that any ”natural ” property that can be tested (either with onesided or with twosided error) can be tested by an oblivious tester. In particular, all the testers studied thus far in the literature were oblivious. Our main result can thus be considered as a precise characterization of the ”natural” graph properties, which are testable with onesided error. One of the main technical contributions of this paper is in showing that any hereditary graph property can be tested with onesided error. This general result contains as a special case all the previous results about testing graph properties with onesided error. These include the results of [20] and [5] about testing kcolorability, the characterization of [21] of the graphpartitioning problems that are testable with onesided error, the induced vertex colorability properties of [3], the induced edge colorability properties of [14], a transformation from twosided to onesided error testing [21], as well as a recent result about testing monotone graph properties [10]. More importantly, as a special case of our main result, we infer that some of the most well studied graph properties, both in graph theory and computer science, are testable with onesided error. Some of these properties are the well known graph properties of being Perfect, Chordal, Interval, Comparability, Permutation and more. None of these properties was previously known to be testable. 1
Using Output Codes to Boost Multiclass Learning Problems
 MACHINE LEARNING: PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE, 1997 (ICML97)
, 1997
"... This paper describes a new technique for solving multiclass learning problems by combining Freund and Schapire's boosting algorithm with the main ideas of Dietterich and Bakiri's method of errorcorrecting output codes (ECOC). Boosting is a general method of improving the accuracy of a given base or ..."
Abstract

Cited by 90 (9 self)
 Add to MetaCart
This paper describes a new technique for solving multiclass learning problems by combining Freund and Schapire's boosting algorithm with the main ideas of Dietterich and Bakiri's method of errorcorrecting output codes (ECOC). Boosting is a general method of improving the accuracy of a given base or "weak" learning algorithm. ECOC is a robust method of solving multiclass learning problems by reducing to a sequence of twoclass problems. We show that our new hybrid method has advantages of both: Like ECOC, our method only requires that the base learning algorithm work on binarylabeled data. Like boosting, we prove that the method comes with strong theoretical guarantees on the training and generalization error of the final combined hypothesis assuming only that the base learning algorithm perform slightly better than random guessing. Although previous methods were known for boosting multiclass problems, the new method may be significantly faster and require less programming effort in creating the base
learning algorithm. We also compare the new algorithm
experimentally to other voting methods.
Sublinear Time Algorithms for Metric Space Problems
"... In this paper we give approximation algorithms for the following problems on metric spaces: Furthest Pair, k median, Minimum Routing Cost Spanning Tree, Multiple Sequence Alignment, Maximum Traveling Salesman Problem, Maximum Spanning Tree and Average Distance. The key property of our algorithms i ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
In this paper we give approximation algorithms for the following problems on metric spaces: Furthest Pair, k median, Minimum Routing Cost Spanning Tree, Multiple Sequence Alignment, Maximum Traveling Salesman Problem, Maximum Spanning Tree and Average Distance. The key property of our algorithms is that their running time is linear in the number of metric space points. As the full specification o`f an npoint metric space is of size \Theta(n 2 ), the complexity of our algorithms is sublinear with respect to the input size. All previous algorithms (exact or approximate) for the problems we consider have running time\Omega\Gamma n 2 ). We believe that our techniques can be applied to get similar bounds for other problems. 1 Introduction In recent years there has been a dramatic growth of interest in algorithms operating on massive data sets. This poses new challenges for algorithm design, as algorithms quite efficient on small inputs (for example, having quadratic running time) ...
Robust PCPs of Proximity, Shorter PCPs and Applications to Coding
 in Proc. 36th ACM Symp. on Theory of Computing
, 2004
"... We continue the study of the tradeo between the length of PCPs and their query complexity, establishing the following main results (which refer to proofs of satis ability of circuits of size n): 1. We present PCPs of length exp( ~ O(log log n) ) n that can be veri ed by making o(log log n) ..."
Abstract

Cited by 80 (25 self)
 Add to MetaCart
We continue the study of the tradeo between the length of PCPs and their query complexity, establishing the following main results (which refer to proofs of satis ability of circuits of size n): 1. We present PCPs of length exp( ~ O(log log n) ) n that can be veri ed by making o(log log n) Boolean queries.