Results 11  20
of
32
Tight Hardness Results for Minimizing Discrepancy
"... In the Discrepancy problem, we are given M sets {S1,..., SM} on N elements. Our goal is to find an assignment χ of {−1, +1} values to elements, so as to minimize the maximum discrepancy maxj  ∑ i∈Sj χ(i). Recently, Bansal gave an efficient algorithm for achieving O ( √ N) discrepancy for any set ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In the Discrepancy problem, we are given M sets {S1,..., SM} on N elements. Our goal is to find an assignment χ of {−1, +1} values to elements, so as to minimize the maximum discrepancy maxj  ∑ i∈Sj χ(i). Recently, Bansal gave an efficient algorithm for achieving O ( √ N) discrepancy for any set system where M = O(N) [Ban10], giving a constructive version of Spencer’s proof that the discrepancy of any set system is at most O ( √ N) for this range of M [Spe85]. We show that from the perspective of computational efficiency, these results are tight for general set systems where M = O(N). Specifically, we show that it is NPhard to distinguish between such set systems with discrepancy zero and those with discrepancy Ω ( √ N). This means that even if the optimal solution has discrepancy zero, we cannot hope to efficiently find a coloring with discrepancy o ( √ N). We also consider the hardness of the Discrepancy problem on sets with bounded shatter function, and show that the upper bounds due to Matouˇsek [Mat95] are tight for these sets systems as well. The hardness results in both settings are obtained from a common framework: we compose a family of high discrepancy set systems with set systems for which it is NPhard to distinguish instances with discrepancy zero from instances in which a large number of the sets (i.e. constant fraction of the sets) have nonzero discrepancy. Our composition amplifies this zero versus nonzero gap. 1
Mergeable Summaries
"... We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means that the summaries can be merged in a way like other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the data sets. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this paper, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modifications. Specifically, we show that for εapproximate heavy hitters, there is a deterministic mergeable summary of size O(1/ε); for εapproximate quantiles, there is a deterministic summary of size O ( 1 log(εn)) that has a restricted form of mergeability, ε and a randomized one of size O ( 1 1 log3/2) with full mergeε ε ability. We also extend our results to geometric summaries such as εapproximations and εkernels. We also achieve two results of independent interest: (1) we provide the best known randomized streaming bound for εapproximate quantiles that depends only on ε, of size O ( 1 1 log3/2), and (2) we demonstrate that the MG and the ε ε SpaceSaving summaries for heavy hitters are isomorphic. Supported by NSF under grants CNS0540347, IIS07
OptimizationBased Design of PlantFriendly Input Signals Using Geometric Discrepancy Criteria
 14 th IFAC Symposium on System Identification (SYSID 2006
, 2006
"... Abstract: The design of constrained, “plantfriendly ” multisine input signals that optimize a geometric discrepancy criterion arising from Weyl’s Theorem is examined in this paper. Such signals are meaningful for datacentric estimation methods, where uniform coverage of the output statespace is c ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract: The design of constrained, “plantfriendly ” multisine input signals that optimize a geometric discrepancy criterion arising from Weyl’s Theorem is examined in this paper. Such signals are meaningful for datacentric estimation methods, where uniform coverage of the output statespace is critical. The usefulness of this problem formulation is demonstrated by applying it to a linear example and to the nonlinear, highly interactive distillation column model developed by Weischedel and McAvoy (1980). The optimization problem includes a search for both the Fourier coefficients and phases in the multisine signal, resulting in an uniformly distributed output signal displaying a desirable balance between high and low gain directions. The solution involves very little user intervention (which enhances its practical usefulness) and has significant benefits compared to multisine signals that minimize crest factor.
Coding Theory And Uniform Distributions
, 1998
"... In the present paper we introduce and study finite point subsets of a special kind, called optimum distributions, in the ndimensional unit cube. Such distributions are close related with known (#, s, n)nets of low discrepancy. It turns out that optimum distributions have a rich combinatorial struc ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In the present paper we introduce and study finite point subsets of a special kind, called optimum distributions, in the ndimensional unit cube. Such distributions are close related with known (#, s, n)nets of low discrepancy. It turns out that optimum distributions have a rich combinatorial structure. Namely, we show that optimum distributions can be characterized completely as maximum distance separable codes with respect to a nonHamming metric. Weight spectra of such codes can be evaluated precisely. We also consider linear codes and distributions and study their general properties including the duality with respect to a suitable inner product. The corresponding generalized MacWilliams identies for weight enumerators are brifly discussed. Broad classes of linear maximum distance separable codes and linear optimum distributions are explicitly constructed in the paper by the Hermite interpolations over finite fields. 1991 Mathematics Subject Classification. 11K38, 11T71, 94B60 Key...
Multidimensional declustering schemes using golden ratio and kronecker sequences
 In IEEE Trans. on Knowledge and Data Engineering
, 2003
"... ..."
epssamples for kernels
 Proceedings 24th Annual ACMSIAM Symposium on Discrete Algorithms
, 2013
"... We study the worst case error of kernel density estimates via subset approximation. A kernel density estimate of a distribution is the convolution of that distribution with a fixed kernel (e.g. Gaussian kernel). Given a subset (i.e. a point set) of the input distribution, we can compare the kernel d ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We study the worst case error of kernel density estimates via subset approximation. A kernel density estimate of a distribution is the convolution of that distribution with a fixed kernel (e.g. Gaussian kernel). Given a subset (i.e. a point set) of the input distribution, we can compare the kernel density estimates of the input distribution with that of the subset and bound the worst case error. If the maximum error is ε, then this subset can be thought of as an εsample (aka an εapproximation) of the range space defined with the input distribution as the ground set and the fixed kernel representing the family of ranges. Interestingly, in this case the ranges are not binary, but have a continuous range (for simplicity we focus on kernels with range of [0, 1]); these allow for smoother notions of range spaces. It turns out, the use of this smoother family of range spaces has an added benefit of greatly decreasing the size required for εsamples. For instance, in the plane the size is O((1/ε 4/3) log 2/3 (1/ε)) for disks (based on VCdimension arguments) but is only O((1/ε) √ log(1/ε)) for Gaussian kernels and for kernels with bounded slope that only affect a bounded domain. These bounds are accomplished by studying the discrepancy of these “kernel ” range spaces, and here the improvement in bounds are even more pronounced. In the plane, we show the discrepancy is O ( √ log n) for these kernels, whereas for
Bounded VCdimension implies a fractional Helly theorem
, 2002
"... We prove that every set system of bounded VCdimension has a fractional Helly property. More precisely, if the dual shatter function of a set system F is bounded by o(m ), then F has fractional Helly number k. This means that for every ff ? 0 there exists a fi ? 0 such that if F 1 ; F 2 ; : : ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We prove that every set system of bounded VCdimension has a fractional Helly property. More precisely, if the dual shatter function of a set system F is bounded by o(m ), then F has fractional Helly number k. This means that for every ff ? 0 there exists a fi ? 0 such that if F 1 ; F 2 ; : : : ; Fn 2 F are sets with i2I F i 6= ; for at least sets I ` f1; 2; : : : ; ng of size k, then there exists a point common to at least fin of the F i . This further implies a (p; k)theorem: for every F as above and every p k there exists T such that if G ` F is a finite subfamily where among every p sets, some k intersect, then G has a transversal of size T . The assumption about bounded dual shatter function applies, for example, to families of sets in R definable by a bounded number of polynomial inequalities of bounded degree; in this case, we obtain fractional Helly number d+1.
Geometric Computations on Indecisive and Uncertain Points
"... We study computing geometric problems on uncertain points. An uncertain point is a point that does not have a fixed location, but rather is described by a probability distribution. When these probability distributions are restricted to a finite number of locations, the points are called indecisive p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We study computing geometric problems on uncertain points. An uncertain point is a point that does not have a fixed location, but rather is described by a probability distribution. When these probability distributions are restricted to a finite number of locations, the points are called indecisive points. In particular, we focus on geometric shapefitting problems and on building compact distributions to describe how the solutions to these problems vary with respect to the uncertainty in the points. Our main results are: (1) a simple and efficient randomized approximation algorithm for calculating the distribution of any statistic on uncertain data sets; (2) a polynomial, deterministic and exact algorithm for computing the distribution of answers for any LPtype problem on an indecisive point set; and (3) the development of shape inclusion probability (SIP) functions which captures the ambient distribution of shapes fit to uncertain or indecisive point sets and are admissible to the two algorithmic constructions. 1
The Test Suite Generation Problem: Optimal Instances and Their Implications
"... In the test suite generation problem (TSG) for software systems, I is a set of n input parameters where each I ∈ I has κ(I) data values, and O is a collection of subsets of I where the interactions of the parameters in each O ∈ O are thought to affect the outcome of the system. A test case for (I, O ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In the test suite generation problem (TSG) for software systems, I is a set of n input parameters where each I ∈ I has κ(I) data values, and O is a collection of subsets of I where the interactions of the parameters in each O ∈ O are thought to affect the outcome of the system. A test case for (I, O, κ) is an ntuple (t1, t2,..., tn) that specifies the value of each input parameter in I. The goal is to generate a smallestsized test suite (i.e., a set of test cases) that covers all combinations of each O ∈ O. The decision version of TSG is known to be NPcomplete. In this paper, we present new families of (I, O, κ) for which optimal test suites can be constructed efficiently. They differ from the ones already known by the way we characterize (I, O) and κ. We then use these instances to generate test suites for arbitrary software systems. When each O ∈ O has O  = 2, the sizes of the test suite are guaranteed to be at most ⌈log 2 n ⌉ × OP T, matching the current best bound for this problem. Our constructions utilize the structure of (I, O) and κ; consequently, the less “complex ” (I, O) and κ are, the better are the bounds on the sizes of the test suites. 1
Discrepancy of Point Sequences on Fractal Sets
"... We consider asymptotic bounds for the discrepancy of point sets on a class of fractal sets. By a method of R. Alexander, we prove that for a wide class of fractals, the L 2 discrepancy (and consequently also the worstcase discrepancy) of an Npoint set with respect to halfspaces is at least of the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider asymptotic bounds for the discrepancy of point sets on a class of fractal sets. By a method of R. Alexander, we prove that for a wide class of fractals, the L 2 discrepancy (and consequently also the worstcase discrepancy) of an Npoint set with respect to halfspaces is at least of the order N^(1/21/(2s)) , where s is the Hausdorff dimension of the fractal. We also show that for many fractals, this bound is tight for the L 2 discrepancy. Determining the correct order of magnitude of the worstcase discrepancy remains a challenging open problem.