Results 1  10
of
108
Data Streams: Algorithms and Applications
, 2005
"... In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerg ..."
Abstract

Cited by 528 (22 self)
 Add to MetaCart
In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudorandom computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [175].1
Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation
, 2000
"... In this paper we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular: ffl we show how to maintain (using only O(log n=ffl 2 ) words of storage) a sketch C(p) of a point p 2 l n 1 under dynamic updates of its coo ..."
Abstract

Cited by 317 (13 self)
 Add to MetaCart
In this paper we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular: ffl we show how to maintain (using only O(log n=ffl 2 ) words of storage) a sketch C(p) of a point p 2 l n 1 under dynamic updates of its coordinates, such that given sketches C(p) and C(q) one can estimate jp \Gamma qj 1 up to a factor of (1 + ffl) with large probability. This solves the main open problem of [10]. ffl we obtain another sketch function C 0 which maps l n 1 into a normed space l m 1 (as opposed to C), such that m = m(n) is much smaller than n; to our knowledge this is the first dimensionality reduction lemma for l 1 norm ffl we give an explicit embedding of l n 2 into l n O(log n) 1 with distortion (1 + 1=n \Theta(1) ) and a nonconstructive embedding of l n 2 into l O(n) 1 with distortion (1 + ffl) such that the embedding can be represented using only O(n log 2 n) bits (as opposed to at least...
Practical privacy: the sulq framework
 In PODS ’05: Proceedings of the twentyfourth ACM SIGMODSIGACTSIGART symposium on Principles of database systems
, 2005
"... We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping ..."
Abstract

Cited by 219 (35 self)
 Add to MetaCart
(Show Context)
We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is P i∈S f(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise – much less than the sampling error – provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (SubLinear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large. We extend this work in two ways. First, we modify the privacy analysis to realvalued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model [11].
Private queries in location based services: anonymizers are not necessary
 In SIGMOD
, 2008
"... Mobile devices equipped with positioning capabilities (e.g., GPS) can ask locationdependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several ..."
Abstract

Cited by 127 (15 self)
 Add to MetaCart
(Show Context)
Mobile devices equipped with positioning capabilities (e.g., GPS) can ask locationdependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several drawbacks: (i) All users must trust the third party anonymizer, which is a single point of attack. (ii) A large number of cooperating, trustworthy users is needed. (iii) Privacy is guaranteed only for a single snapshot of user locations; users are not protected against correlation attacks (e.g., history of user movement). We propose a novel framework to support private locationdependent queries, based on the theoretical work on Private Information Retrieval (PIR). Our framework does not require a trusted third party, since privacy is achieved via cryptographic techniques. Compared to existing work, our approach achieves stronger privacy for snapshots of user locations; moreover, it is the first to provide provable privacy guarantees against correlation attacks. We use our framework to implement approximate and exact algorithms for nearestneighbor search. We optimize query execution by employing data mining techniques, which identify redundant computations. Contrary to common belief, the experimental results suggest that PIR approaches incur reasonable overhead and are applicable in practice.
Communication Preserving Protocols for Secure Function Evaluation
 In Proc. of 33rd STOC
, 2001
"... A secure function evaluation protocol allows two parties to jointly compute a function f(x; y) of their inputs in a manner not leaking more information than necessary. A major result in this field is: "any function f that can be computed using polynomial resources can be computed securely usin ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
(Show Context)
A secure function evaluation protocol allows two parties to jointly compute a function f(x; y) of their inputs in a manner not leaking more information than necessary. A major result in this field is: "any function f that can be computed using polynomial resources can be computed securely using polynomial resources" (where `resources' refers to communication and computation). This result follows by a general transformation from any circuit for f to a secure protocol that evaluates f . Although the resources used by protocols resulting from this transformation are polynomial in the circuit size, they are much higher (in general) than those required for an insecure computation of f . We propose a new methodology for designing secure protocols, utilizing the communication complexity tree (or branching program) representation of f . We start with an efficient (insecure) protocol for f and transform it into a secure protocol. In other words, "any function f that can be computed using communication complexity c can be can be computed securely using communication complexity that is polynomial in c and a security parameter". We show several simple applications of this new methodology resulting in protocols efficient either in communication or in computation. In particular, we exemplify a protocol for the "millionaires problem ", where two participants want to compare their values but reveal no other information. Our protocol is more efficient than previously known ones in either communication or computation. 1.
Secure Computation of the kthRanked Element
 In Avdances in Cryptology  Proc. of Eurocyrpt ’04
, 2004
"... Given two or more parties possessing large, confidential datasets, we consider the problem of securely computing the k of the datasets, e.g. the median of the values in the datasets. We investigate protocols with sublinear computation and communication costs. In the twoparty case, we show tha ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
Given two or more parties possessing large, confidential datasets, we consider the problem of securely computing the k of the datasets, e.g. the median of the values in the datasets. We investigate protocols with sublinear computation and communication costs. In the twoparty case, we show that the k ranked element can be computed in log k rounds, where the computation and communication costs of each round are O(log M), where log M is the number of bits needed to describe each element of the input data.
Selective private function evaluation with applications to private statistics
 In Proceedings of Twentieth ACM Symposium on Principles of Distributed Computing (PODC
, 2001
"... Motivated by the application of private statistical analysis of large databases, we consider the problem of selective private function evaluation (SPFE). In this problem, a client interacts with one or more servers holding copies of a database z = zt,...,z, in order to compute f(z~t,...,z~,,,) , fo ..."
Abstract

Cited by 54 (9 self)
 Add to MetaCart
(Show Context)
Motivated by the application of private statistical analysis of large databases, we consider the problem of selective private function evaluation (SPFE). In this problem, a client interacts with one or more servers holding copies of a database z = zt,...,z, in order to compute f(z~t,...,z~,,,) , for some function f and indices i = it,...,i, ~ chosen by the client. Ideally, the client must learn nothing more about the database than f(zit,..., zi,,~), and the servers should learn nothing. Generic solutions for this problem, based on standard techniques for secure function evaluation, incur communication complexity that is at least linear in n, making them prohibitive for large databases even when f is relatively simple and m is small. We present various approaches for constructing sublinearcommunication $PFE protocols, both for the general problem and for special cases of interest. Our solutions not only offer sublinear communication complexity, but are also practical in many scenarios. 1.
Privacypreserving sharing and correlation of security alerts
 In USENIX Security Symposium
, 2004
"... Shmatikov z SRI International ..."
(Show Context)