Results 1  10
of
63
The art of uninformed decisions: A primer to property testing
 Science
, 2001
"... Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size. ..."
Abstract

Cited by 128 (20 self)
 Add to MetaCart
Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size.
Analysis of representations for domain adaptation
 In NIPS
, 2007
"... Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS) ..."
Abstract

Cited by 109 (11 self)
 Add to MetaCart
Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS)
Detecting Change in Data Streams
, 2004
"... Detecting changes in a data stream is an important area of research with many applications. ..."
Abstract

Cited by 93 (2 self)
 Add to MetaCart
Detecting changes in a data stream is an important area of research with many applications.
Streaming and sublinear approximation of entropy and information distances
 In ACMSIAM Symposium on Discrete Algorithms
, 2006
"... In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the pr ..."
Abstract

Cited by 55 (13 self)
 Add to MetaCart
In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the problem of property testing with respect to the JensenShannon distance. We present optimal algorithms for estimating bounded, symmetric fdivergences (including the JensenShannon divergence and the Hellinger distance) between distributions in various property testing frameworks. Along the way, we close a (log n)/H gap between the upper and lower bounds for estimating entropy H, yielding an optimal algorithm over all values of the entropy. In a data stream setting (sublinear space), we give the first algorithm for estimating the entropy of a distribution. Our algorithm runs in polylogarithmic space and yields an asymptotic constant factor approximation scheme. An integral part of the algorithm is an interesting use of an F0 (the number of distinct elements in a set) estimation algorithm; we also provide other results along the space/time/approximation tradeoff curve. Our results have interesting structural implications that connect sublinear time and space constrained algorithms. The mediating model is the random order streaming model, which assumes the input is a random permutation of a multiset and was first considered by Munro and Paterson in 1980. We show that any property testing algorithm in the combined oracle model for calculating a permutation invariant functions can be simulated in the random order model in a single pass. This addresses a question raised by Feigenbaum et al regarding the relationship between property testing and stream algorithms. Further, we give a polylogspace PTAS for estimating the entropy of a one pass random order stream. This bound cannot be achieved in the combined oracle (generalized property testing) model. 1
Testing random variables for independence and identity
 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
, 2000
"... Given access to independent samples of a distribution �over�℄�℄, we show how to test whether the distributions formed by projecting�to each coordinate are independent, i.e., whether�isclose in the norm to the product distribution��for some distributions�over �℄and�over�℄. The sample complexity of o ..."
Abstract

Cited by 54 (18 self)
 Add to MetaCart
Given access to independent samples of a distribution �over�℄�℄, we show how to test whether the distributions formed by projecting�to each coordinate are independent, i.e., whether�isclose in the norm to the product distribution��for some distributions�over �℄and�over�℄. The sample complexity of our test is �poly, assuming without loss of generality that �. We also give a matching lower bound, up to poly� � factors. Furthermore, given access to samples of a distribution �over�℄, we show how to test if�isclose in norm to an explicitly specified distribution�. Our test uses��poly samples, which nearly matches the known tight bounds for the case when�is uniform. 1.
Sampling Algorithms: Lower Bounds and Applications (Extended Abstract)
, 2001
"... ] Ziv BarYossef y Computer Science Division U. C. Berkeley Berkeley, CA 94720 zivi@cs.berkeley.edu Ravi Kumar IBM Almaden 650 Harry Road San Jose, CA 95120 ravi@almaden.ibm.com D. Sivakumar IBM Almaden 650 Harry Road San Jose, CA 95120 siva@almaden.ibm.com ABSTRACT We develop a fr ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
] Ziv BarYossef y Computer Science Division U. C. Berkeley Berkeley, CA 94720 zivi@cs.berkeley.edu Ravi Kumar IBM Almaden 650 Harry Road San Jose, CA 95120 ravi@almaden.ibm.com D. Sivakumar IBM Almaden 650 Harry Road San Jose, CA 95120 siva@almaden.ibm.com ABSTRACT We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form f : A n ! B, where A and B are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables x i that any sampling algorithm needs to query to approximate f(x1 ; : : : ; xn ). We define two quantitative properties of functions  the block sensitivity and the minimum Hellinger distance  that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds. We also point out some connections between sampling and streaming algorithms and lossy compression schemes. 1.
On testing expansion in boundeddegree graphs
 Electronic Colloquium on Computational Complexity (ECCC
, 2000
"... Abstract. We consider testing graph expansion in the boundeddegree graph model. Specifically, we refer to algorithms for testing whether the graph has a second eigenvalue bounded above by a given threshold or is far from any graph with such (or related) property. We present a natural algorithm aime ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
Abstract. We consider testing graph expansion in the boundeddegree graph model. Specifically, we refer to algorithms for testing whether the graph has a second eigenvalue bounded above by a given threshold or is far from any graph with such (or related) property. We present a natural algorithm aimed towards achieving the foregoing task. The algorithm is given a (normalized) eigenvalue bound λ < 1, oracle access to a boundeddegree Nvertex graph, and two additional parameters ǫ, α> 0. The algorithm runs in time N 0.5+α /poly(ǫ), and accepts any graph having (normalized) second eigenvalue at most λ. We believe that the algorithm rejects any graph that is ǫfar from having second eigenvalue at most λ α/O(1) , and prove the validity of this belief under an appealing combinatorial conjecture.
The complexity of approximating the entropy
 SIAM Journal on Computing
"... We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this ( case, we show that a γmultiplicative approximation to the entropy can be obtained in O n (1+η)/γ2 log n time for distributions with entropy Ω(γ/η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For ( the class of distributions to which our upper bound applies, we obtain a lower bound of Ω n1/(2γ2) We next consider a combined oracle model in which the algorithm has access to both the
Algorithmic and Analysis Techniques in Property Testing
"... Property testing algorithms are “ultra”efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Property testing algorithms are “ultra”efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform (local) queries to the input, though the decision they need to make usually concern properties with a global nature. In the last two decades, property testing algorithms have been designed for many types of objects and properties, amongst them, graph properties, algebraic properties, geometric properties, and more. In this article we survey results in property testing, where our emphasis is on common analysis and algorithmic techniques. Among the techniques surveyed are the following: • The selfcorrecting approach, which was mainly applied in the study of property testing of algebraic properties; • The enforce and test approach, which was applied quite extensively in the analysis of algorithms for testing graph properties (in the densegraphs model), as well as in other contexts;
Sublinear time algorithms
 SIGACT News
, 2003
"... Abstract Sublinear time algorithms represent a new paradigm in computing, where an algorithmmust give some sort of an answer after inspecting only a very small portion of the input. We discuss the sorts of answers that one might be able to achieve in this new setting. 1 Introduction The goal of algo ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Abstract Sublinear time algorithms represent a new paradigm in computing, where an algorithmmust give some sort of an answer after inspecting only a very small portion of the input. We discuss the sorts of answers that one might be able to achieve in this new setting. 1 Introduction The goal of algorithmic research is to design efficient algorithms, where efficiency is typicallymeasured as a function of the length of the input. For instance, the elementary school algorithm for multiplying two n digit integers takes roughly n2 steps, while more sophisticated algorithmshave been devised which run in less than n log2 n steps. It is still not known whether a linear time algorithm is achievable for integer multiplication. Obviously any algorithm for this task, as for anyother nontrivial task, would need to take at least linear time in n, since this is what it would take to read the entire input and write the output. Thus, showing the existence of a linear time algorithmfor a problem was traditionally considered to be the gold standard of achievement. Nevertheless, due to the recent tremendous increase in computational power that is inundatingus with a multitude of data, we are now encountering a paradigm shift from traditional computational models. The scale of these data sets, coupled with the typical situation in which there is verylittle time to perform our computations, raises the issue of whether there is time to consider any more than a miniscule fraction of the data in our computations? Analogous to the reasoning thatwe used for multiplication, for most natural problems, an algorithm which runs in sublinear time must necessarily use randomization and must give an answer which is in some sense imprecise.Nevertheless, there are many situations in which a fast approximate solution is more useful than a slower exact solution.