Results 1  10
of
22
Smooth sensitivity and sampling in private data analysis
 In STOC
, 2007
"... We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data ..."
Abstract

Cited by 106 (15 self)
 Add to MetaCart
We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data with instancebased additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also by the database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smooth sensitivity of f on the database x — a measure of variability of f in the neighborhood of the instance x. The new framework greatly expands the applicability of output perturbation, a technique for protecting individuals ’ privacy by adding a small amount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instancebased noise in the context of data privacy. Our framework raises many interesting algorithmic questions. Namely, to apply the framework one must compute or approximate the smooth sensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost of the minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on many databases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known or when f is given as a black box. We illustrate the procedure by applying it to kSED (kmeans) clustering and learning mixtures of Gaussians.
Concentration Inequalities Using the Entropy Method
, 2002
"... We investigate a new methodology... The main purpose of this paper is to point out the simplicity and the generality of the approach. We show how the new method can recover many of Talagrand's revolutionary inequalities and provide new applications in a variety of problems including Rademacher avera ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
We investigate a new methodology... The main purpose of this paper is to point out the simplicity and the generality of the approach. We show how the new method can recover many of Talagrand's revolutionary inequalities and provide new applications in a variety of problems including Rademacher averages, Rademacher chaos, the number of certain small subgraphs in a random graph, and the minimum of the empirical risk in some statistical estimation problems.
New bounds on nearly perfect matchings in hypergraphs: higher codegrees do help
 Random Struct. Alg
, 2000
"... Let H be a (k + 1)uniform, Dregular hypergraph on n vertices and U(H) be the minimum number of vertices left uncovered by a matching in H. Cj(H), the jcodegree of H, is the maximum number of edges sharing a set of j vertices in common. We prove a general upper bound on U(H), based on the codegr ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Let H be a (k + 1)uniform, Dregular hypergraph on n vertices and U(H) be the minimum number of vertices left uncovered by a matching in H. Cj(H), the jcodegree of H, is the maximum number of edges sharing a set of j vertices in common. We prove a general upper bound on U(H), based on the codegree sequence C2(H), C3(H).... Our bound improves and generalizes many results on the topic, including those of Grable [Gra], AlonKimSpencer [AKS], and KostochkaRödl [KR]. It also leads to a substantial improvement in several applications. The key ingredient of the proof is the socalled polynomial technique, which is a new and useful tool to prove concentration results for functions with large Lipschitz coefficient. This technique is of independent interest.
The deletion method for upper tail estimates
, 2003
"... We present a new method to show concentration of the upper tail of random variables that can be written as sums of variables with plenty of independence. We compare our method with the martingale method by Kim and Vu, which often leads to similar results. Some applications are given to the number XG ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We present a new method to show concentration of the upper tail of random variables that can be written as sums of variables with plenty of independence. We compare our method with the martingale method by Kim and Vu, which often leads to similar results. Some applications are given to the number XG of copies of a graph G in the random graph G(n, p). In particular, for G = K4 and G = C4 we improve the earlier known upper bounds on − ln P(XK4 ≥ 2 E XK4) in some range of p = p(n)..
COLORFUL TRIANGLE COUNTING AND A MAPREDUCE IMPLEMENTATION
"... In this note we introduce a new randomized algorithm for counting triangles in graphs. We show that under mild conditions, the estimate of our algorithm is strongly concentrated around the true number of triangles. Specifically, let G be a graph with n vertices, t triangles and let ∆ be the maximum ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
In this note we introduce a new randomized algorithm for counting triangles in graphs. We show that under mild conditions, the estimate of our algorithm is strongly concentrated around the true number of triangles. Specifically, let G be a graph with n vertices, t triangles and let ∆ be the maximum number of triangles an edge of G is contained in. Also, let N = 1/p the number of colors we ∆ log n use in our randomized algorithm. We show that if p ≥ max ( log n
Efficient Triangle Counting in Large Graphs via Degreebased Vertex Partitioning
"... The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering t ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering the hidden thematic structures in the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting approximation algorithm which can be adapted to the semistreaming model [23]. The key idea of our algorithm is to combine the sampling algorithm of [51,52] and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in [5], treating each set appropriately. From a mathematical perspective, we show a simplified proof of [52] which uses the powerful KimVu concentration inequality [31] based on the HajnalSzemerédi theorem [25]. Furthermore, we improve bounds of existing triple sampling ( techniques based on a theorem of Ahlswede and Katona [3]. We obtain a running time O m + m3/2 log n tɛ2) and an (1 ± ɛ)
The Infamous Upper Tail
, 2001
"... Let \Gamma be a finite index set and k * 1 a given integer. Let further S ` [\Gamma]^k be an arbitrary family of k element subsets of \Gamma. Consider a (binomial) random subset \Gamma p of \Gamma, where p = (pi: i 2 \Gamma) and a random variable X counting the elements of S that are contained in th ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Let \Gamma be a finite index set and k * 1 a given integer. Let further S ` [\Gamma]^k be an arbitrary family of k element subsets of \Gamma. Consider a (binomial) random subset \Gamma p of \Gamma, where p = (pi: i 2 \Gamma) and a random variable X counting the elements of S that are contained in this random subset. In this paper we survey techniques of obtaining upper bounds on the upper tail probabilities P(X * * + t) for t? 0. Seven techniques, ranging from Azuma's inequality to the purely combinatorial deletion method, are described, illustrated and compared against each other for a couple of typical applications. As one application, we obtain essentially optimal bounds for the upper tails for the numbers of subgraphs isomorphic to K4 or C4 in a random graph G(n; p), for certain ranges of p.
Combinatorics with a geometric flavor: some examples
 in Visions in Mathematics Toward 2000 (Geometric and Functional Analysis, Special Volume
, 2000
"... In this paper I try to present my field, combinatorics, via five examples of combinatorial studies which have some geometric flavor. The first topic is Tverberg's theorem, a gem in combinatorial geometry, and various of its combinatorial and topological extensions. McMullen's upper bound theorem for ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In this paper I try to present my field, combinatorics, via five examples of combinatorial studies which have some geometric flavor. The first topic is Tverberg's theorem, a gem in combinatorial geometry, and various of its combinatorial and topological extensions. McMullen's upper bound theorem for the face numbers of convex polytopes and its many extensions is the second topic. Next are general properties of subsets of the vertices of the discrete ndimensional cube and some relations with questions of extremal and probabilistic combinatorics. Our fourth topic is tree enumeration and random spanning trees, and finally, some combinatorial and geometrical aspects of the simplex method for linear programming are considered.
A general upper bound on the list chromatic number of locally sparse graphs
 Combinatorics, Probability and Computing
"... Abstract. Suppose that G is a graph with maximum degree ∆ and for every vertex v in G, the neighborhood of v contains at most ∆2 /f edges. We prove that the list chromatic number of G is at most K∆ / log f, for some positive constant K. This result strengthens several previous results [AKSu2, Kim, J ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract. Suppose that G is a graph with maximum degree ∆ and for every vertex v in G, the neighborhood of v contains at most ∆2 /f edges. We prove that the list chromatic number of G is at most K∆ / log f, for some positive constant K. This result strengthens several previous results [AKSu2, Kim, Joh, Vu1] and is sharp up to the multiplicative constant K. As an application, we shall derive several upper bounds for the strong (list) chromatic index of a graph, under various assumptions. These bounds improve earlier results in [FGST,Mah] and determine, up to a constant factor, the strong (list) chromatic index of a random graph. One of the main ingredients of the proof of the main theorem is the socalled “polynomial method”, which is a new method developed to prove strong concentration of functions with large Lipschitz coefficients. A combination of this method and the well known semirandom method seems to provide a powerful technique to prove the existence of combinatorial objects. Given a graph G, the list chromatic number of G is defined as follows. Assign to each vertex v in G a list Lv of k colors (different vertices may have different lists), a list coloring is a coloring in which every vertex is colored by a color from its own list. The list chromatic