Results 1  10
of
30
Sparser JohnsonLindenstrauss Transforms
"... We give two different constructions for dimensionality reduction in ℓ2 via linear mappings that are sparse: only an O(ε)fraction of entries in each column of our embedding matrices are nonzero to achieve distortion 1+ε with high probability, while still achieving the asymptotically optimal number ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
(Show Context)
We give two different constructions for dimensionality reduction in ℓ2 via linear mappings that are sparse: only an O(ε)fraction of entries in each column of our embedding matrices are nonzero to achieve distortion 1+ε with high probability, while still achieving the asymptotically optimal number of rows. These are the first constructions to provide subconstant sparsity for all values of parameters. Both constructions are also very simple: a vector can be embedded in two for loops. Such distributions can be used to speed up applications where ℓ2 dimensionality reduction is used.
A Derandomized Sparse JohnsonLindenstrauss Transform
"... Recent work of [DasguptaKumarSarlós, STOC 2010] gave a sparse JohnsonLindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
(Show Context)
Recent work of [DasguptaKumarSarlós, STOC 2010] gave a sparse JohnsonLindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded independence hash functions. Furthermore, the sparsity bound obtained in our proof is improved. The main ingredient in our proof is a spectral moment bound for quadratic forms that was recently used in [DiakonikolasKaneNelson, FOCS 2010].
Tight bounds for lp samplers, finding duplicates in streams, and related problems
 In PODS
, 2011
"... In this paper, we present nearoptimal space bounds for Lpsamplers. Given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x ∈ R n, a perfect Lp sampler outputs the ith coordinate with probability xi p/‖x‖pp. In SODA 2010, Monemizadeh and Woodruff showe ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we present nearoptimal space bounds for Lpsamplers. Given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x ∈ R n, a perfect Lp sampler outputs the ith coordinate with probability xi p/‖x‖pp. In SODA 2010, Monemizadeh and Woodruff showed polylog space upper bounds for approximate Lpsamplers and demonstrated various applications of them. Very recently, Andoni, Krauthgamer and Onak improved the upper bounds and gave a O(ǫ−p log3 n) space ǫ relative error and constant failure rate Lpsampler for p ∈ [1, 2]. In this work, we give another such algorithm requiring only O(ǫ−p log2 n) space for p ∈ (1, 2). For p ∈ (0, 1), our space bound is O(ǫ−1 log2 n), while for the p = 1 case we have an O(log(1/ǫ)ǫ−1 log2 n) space algorithm. We also give a O(log2 n) bits zero relative error L0sampler, improving the O(log3 n) bits algorithm due to Frahling, Indyk and Sohler. As an application of our samplers, we give better upper bounds for the problem of finding duplicates in data streams. In case the length of the stream is longer than the alphabet size, L1 sampling gives us an O(log 2 n) space algorithm, thus improving the previous O(log3 n) bound due to Gopalan and Radhakrishnan. In the second part of our work, we prove an Ω(log2 n) lower bound for sampling from 0, ±1 vectors (in this special case, the parameter p is not relevant for Lp sampling). This matches the space of our sampling algorithms for constant ǫ> 0. We also prove tight space lower bounds for the finding duplicates and heavy hitters problems. We obtain these lower bounds using reductions from the communication complexity problem augmented indexing.
Almost optimal explicit JohnsonLindenstrauss transformations
 In Proceedings of the 15th International Workshop on Randomization and Computation (RANDOM
, 2011
"... Abstract. The JohnsonLindenstrauss lemma is a fundamental result in probability with several applications in the design and analysis of algorithms. Constructions of linear embeddings satisfying the JohnsonLindenstrauss property necessarily involve randomness and much attention has been given to ob ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
Abstract. The JohnsonLindenstrauss lemma is a fundamental result in probability with several applications in the design and analysis of algorithms. Constructions of linear embeddings satisfying the JohnsonLindenstrauss property necessarily involve randomness and much attention has been given to obtain explicit constructions minimizing the number of random bits used. In this work we give explicit constructions with an almost optimal use of randomness: For 0 < ε, δ < 1/2, we obtain explicit generators G: {0, 1} r → R s×d for s = O(log(1/δ)/ε 2) such that for all ddimensional vectors w of Euclidean norm 1,
Sketched SVD: Recovering spectral features from compressive measurements. ArXiv eprints
, 2012
"... ar ..."
(Show Context)
Sketching and Streaming HighDimensional Vectors
, 2011
"... A sketch of a dataset is a smallspace data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sk ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
A sketch of a dataset is a smallspace data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sketch itself be computable by a smallspace algorithm given just one pass over the data, a socalled streaming algorithm. Sketching and streaming have found numerous applications in network traffic monitoring, data mining, trend detection, sensor networks, and databases. In this thesis, I describe several new contributions in the area of sketching and streaming algorithms. • The first spaceoptimal streaming algorithm for the distinct elements problem. Our algorithm also achieves O(1) update and reporting times. • A streaming algorithm for Hamming norm estimation in the turnstile model which achieves the best known space complexity.
Beating the Direct Sum Theorem in Communication Complexity with Implications for Sketching
"... A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k · R1/3(f), where R1/3(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k · R1/3(f), where R1/3(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for a natural family of functions f, showing that the 1way communication required to solve k copies of f simultaneously with probability 2/3 is Ω(k · R1/k(f)). Since R1/k(f) may be as large as Ω(R1/3(f) · log k), we asymptotically beat the direct sum bound for such functions, showing that the trivial upper bound of solving each of the k copies of f with probability 1 − O(1/k) and taking a union bound is optimal! In order to achieve this, our direct sum involves a novel measure of information cost which allows a protocol to abort with constant probability, and otherwise must be correct with very high probability. Moreover, for the functions considered, we show strong lower bound on the communication cost of protocols with these relaxed guarantees; indeed, our lower bounds match those for protocols that are not allowed to abort. In the distributed and streaming models, where one wants to be correct not only on a single query, but simultaneously on a sequence of n queries, we obtain optimal lower bounds on the communication or space complexity. Lower bounds obtained from our direct sum result show that a number of techniques in the sketching literature are optimal, including the following: • (JL transform) Lower bound of Ω ( 1 ɛ2 log n) on the δ dimension of (oblivious) JohnsonLindenstrauss transforms. • (ℓpestimation) Lower bound for the size of encodings of n vectors in [±M] d that allow ℓ1 or ℓ2estimation of (log d + log M)). Ω(nɛ −2 log n δ • (Matrix sketching) Lower bound of Ω ( 1 ɛ2 log n) on the δ dimension of a matrix sketch S satisfying the entrywise guarantee (ASS T B)i,j − (AB)i,j  ≤ ɛ‖Ai‖2‖B j ‖2. • (Database joins) Lower bound of Ω(n 1 ɛ2 log n log M) for δ sketching frequency vectors of n tables in a database, each with M records, in order to allow join size estimation. 1
Approximating large frequency moments with pickanddrop sampling
 In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques  16th International Workshop, APPROX 2013, and 17th International Workshop, RANDOM 2013
"... ar ..."
(Show Context)
Budget ErrorCorrecting under EarthMover Distance. Research report
, 2013
"... Abstract. We study the following budget errorcorrecting problem: Alice has a point set x and Bob has a point set y in the ddimensional grid. Alice wants to send a short message to Bob so that Bob can use this information to adjust his point set y towards x to minimize the EarthMover Distance betw ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. We study the following budget errorcorrecting problem: Alice has a point set x and Bob has a point set y in the ddimensional grid. Alice wants to send a short message to Bob so that Bob can use this information to adjust his point set y towards x to minimize the EarthMover Distance between the two point sets. A more intuitive way to understand this problem is: Alice tries to help Bob to recall Eve’s face by sending him a short message. Of course Bob will fail to recall if he does not know Eve, but if he knows something about Eve, the message could help a lot. Naturally, there is a tradeoff between the message size and the quality of such an adjustment. Now given a quality constraint, we want to minimize the message size. This problem is well motivated by applications including image exchange/synchronization and video compression. In this paper, we give almost matching upper and lower bounds for this problem. 1