Results 11  20
of
533
Compressed Bloom Filters
, 2001
"... A Bloom filter is a simple spaceefficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this drawback when the probability of an error is sufficiently low. We in ..."
Abstract

Cited by 193 (10 self)
 Add to MetaCart
A Bloom filter is a simple spaceefficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this drawback when the probability of an error is sufficiently low. We introduce compressed Bloom filters, which improve performance when the Bloom filter is passed as a message, and its transmission size is a limiting factor. For example, Bloom filters have been suggested as a means for sharing Web cache information. In this setting, proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup. The cost is the processing time for compression and decompression, which can use simple arithmetic coding, and more memory use at the proxies, which utilize the larger uncompressed form of the Bloom filter.
Minwise Independent Permutations
 Journal of Computer and System Sciences
, 1998
"... We define and study the notion of minwise independent families of permutations. We say that F ⊆ Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 X . In other words we require that all the elements of any fixed set ..."
Abstract

Cited by 191 (11 self)
 Add to MetaCart
We define and study the notion of minwise independent families of permutations. We say that F ⊆ Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 X . In other words we require that all the elements of any fixed set X have an equal chance to become the minimum element of the image of X under π. Our research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and filter nearduplicate documents. However, in the course of our investigation we have discovered interesting and challenging theoretical questions related to this concept – we present the solutions to some of them and we list the rest as open problems.
Visual Cryptography
, 1995
"... In this paper we consider a new type of cryptographic scheme, which can decode concealed images without any cryptographic computations. The scheme is perfectly secure and very easy to implement. We extend it into a visual variant of the k out of n secret sharing problem, in which a dealer provides a ..."
Abstract

Cited by 175 (4 self)
 Add to MetaCart
In this paper we consider a new type of cryptographic scheme, which can decode concealed images without any cryptographic computations. The scheme is perfectly secure and very easy to implement. We extend it into a visual variant of the k out of n secret sharing problem, in which a dealer provides a transparency to each one of the n users; any k of them can see the image by stacking their transparencies, but any k  1 of them gain no information about it.
What’s hot and what’s not: Tracking most frequent items dynamically
 In Proceedings of ACM Principles of Database Systems
, 2003
"... Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items ” in the relation: those that appear many times (most frequently, or more than some threshold). For example, endbiased histograms keep the hot items as part of ..."
Abstract

Cited by 174 (14 self)
 Add to MetaCart
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items ” in the relation: those that appear many times (most frequently, or more than some threshold). For example, endbiased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in many applications. We present new methods for dynamically determining the hot items at any time in a relation which is undergoing deletion operations as well as inserts. Our methods maintain small space data structures that monitor the transactions on the relation, and when required, quickly output all hot items, without rescanning the relation in the database. With userspecified probability, all hot items are correctly reported. Our methods rely on ideas from “group testing”. They are simple to implement, and have provable quality, space and time guarantees. Previously known algorithms for this problem that make similar quality and performance guarantees can not handle deletions, and those that handle deletions can not make similar guarantees without rescanning the database. Our experiments with real and synthetic data show that our algorithms are accurate in dynamically tracking the hot items independent of the rate of insertions and deletions.
SinglePacket IP Traceback
, 2002
"... The design of the IP protocol makes it difficult to reliably identify the originator of an IP packet. Even in the absence of any deliberate attempt to disguise a packet's origin, widespread packet forwarding techniques such as NAT and encapsulation may obscure the packet's true source. Techniques h ..."
Abstract

Cited by 152 (4 self)
 Add to MetaCart
The design of the IP protocol makes it difficult to reliably identify the originator of an IP packet. Even in the absence of any deliberate attempt to disguise a packet's origin, widespread packet forwarding techniques such as NAT and encapsulation may obscure the packet's true source. Techniques have been developed to determine the source of large packet flows, but, to date, no system has been presented to track individual packets in an efficient, scalable fashion. We present a hashbased technique for IP traceback that generates audit trails for traffic within the network, and can trace the origin of a single IP packet delivered by the network in the recent past. We demonstrate that the system is effective, spaceefficient (requiring approximately 0.5% of the link capacity per unit time in storage) , and implementable in current or nextgeneration routing hardware. We present both analytic and simulation results showing the system's effectiveness.
Counting Distinct Elements in a Data Stream
, 2002
"... We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± epsilon. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs. ..."
Abstract

Cited by 144 (4 self)
 Add to MetaCart
We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± epsilon. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.
Universal Hash Proofs and a Paradigm for Adaptive Chosen Ciphertext Secure PublicKey Encryption
, 2001
"... We present several new and fairly practical publickey encryption schemes and prove them secure against adaptive chosen ciphertext attack. One scheme is based on Paillier's Decision Composite Residuosity (DCR) assumption [7], while another is based in the classical Quadratic Residuosity (QR) assu ..."
Abstract

Cited by 139 (7 self)
 Add to MetaCart
We present several new and fairly practical publickey encryption schemes and prove them secure against adaptive chosen ciphertext attack. One scheme is based on Paillier's Decision Composite Residuosity (DCR) assumption [7], while another is based in the classical Quadratic Residuosity (QR) assumption. The analysis is in the standard cryptographic model, i.e., the security of our schemes does not rely on the Random Oracle model. We also introduce the notion of a universal hash proof system. Essentially, this is a special kind of noninteractive zeroknowledge proof system for an NP language. We do not show that universal hash proof systems exist for all NP languages, but we do show how to construct very ecient universal hash proof systems for a general class of grouptheoretic language membership problems. Given an ecient universal hash proof system for a language with certain natural cryptographic indistinguishability properties, we show how to construct an ecient publickey encryption schemes secure against adaptive chosen ciphertext attack in the standard model. Our construction only uses the universal hash proof system as a primitive: no other primitives are required, although even more ecient encryption schemes can be obtained by using hash functions with appropriate collisionresistance properties. We show how to construct ecient universal hash proof systems for languages related to the DCR and QR assumptions. From these we get corresponding publickey encryption schemes that are secure under these assumptions. We also show that the CramerShoup encryption scheme (which up until now was the only practical encryption scheme that could be proved secure against adaptive chosen ciphertext attack under a reasonable assumption, namely, the Decision...
Building a Better NetFlow
, 2004
"... Network operators need to determine the composition of the traffic mix on links when looking for dominant applications, users, or estimating traffic matrices. Cisco's NetFlow has evolved into a solution that satisfies this need by reporting flow records that summarize a sample of the traffic travers ..."
Abstract

Cited by 133 (5 self)
 Add to MetaCart
Network operators need to determine the composition of the traffic mix on links when looking for dominant applications, users, or estimating traffic matrices. Cisco's NetFlow has evolved into a solution that satisfies this need by reporting flow records that summarize a sample of the traffic traversing the link. But sampled NetFlow has shortcomings that hinder the collection and analysis of traffic data. First, during flooding attacks router memory and network bandwidth consumed by flow records can increase beyond what is available; second, selecting the right static sampling rate is difficult because no single rate gives the right tradeoff of memory use versus accuracy for all traffic mixes; third, the heuristics routers use to decide when a flow is reported are a poor match to most applications that work with time bins; finally, it is impossible to estimate without bias the number of active flows for aggregates with nonTCP traffic. In thi paper we propose...
Sketchbased Change Detection: Methods, Evaluation, and Applications
 IN INTERNET MEASUREMENT CONFERENCE
, 2003
"... Traffic anomalies such as failures and attacks are commonplace in today's network, and identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows that need to be examined for significant changes in traffic patt ..."
Abstract

Cited by 129 (16 self)
 Add to MetaCart
Traffic anomalies such as failures and attacks are commonplace in today's network, and identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows that need to be examined for significant changes in traffic pattern (e.g., volume, number of connections) . However, as link speeds and the number of flows increase, keeping perflow state is either too expensive or too slow. We propose building compact summaries of the traffic data using the notion of sketches. We have designed a variant of the sketch data structure, kary sketch, which uses a constant, small amount of memory, and has constant perrecord update and reconstruction cost. Its linearity property enables us to summarize traffic at various levels. We then implement a variety of time series forecast models (ARIMA, HoltWinters, etc.) on top of such summaries and detect significant changes by looking for flows with large forecast errors. We also present heuristics for automatically configuring the model parameters. Using a
On Hiding Information from an Oracle
, 1989
"... : We consider the problem of computing with encrypted data. Player A wishes to know the value f(x) for some x but lacks the power to compute it. Player B has the power to compute f and is willing to send f(y) to A if she sends him y, for any y. Informally, an encryption scheme for the problem f is a ..."
Abstract

Cited by 128 (15 self)
 Add to MetaCart
: We consider the problem of computing with encrypted data. Player A wishes to know the value f(x) for some x but lacks the power to compute it. Player B has the power to compute f and is willing to send f(y) to A if she sends him y, for any y. Informally, an encryption scheme for the problem f is a method by which A, using her inferior resources, can transform the cleartext instance x into an encrypted instance y, obtain f(y) from B, and infer f(x) from f(y) in such a way that B cannot infer x from y. When such an encryption scheme exists, we say that f is encryptable. The framework defined in this paper enables us to prove precise statements about what an encrypted instance hides and what it leaks, in an informationtheoretic sense. Our definitions are cast in the language of probability theory and do not involve assumptions such as the intractability of factoring or the existence of oneway functions. We use our framework to describe encryption schemes for some wellknown function...