Results 1 - 10
of
22
Less hashing, same performance: Building a better bloom filter
- In Proc. the 14th Annual European Symposium on Algorithms (ESA 2006
, 2006
"... ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, on ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for
An Improved Construction for Counting Bloom Filters
- 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-bas ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
Rank-indexed hashing: A compact construction of bloom filters and extra bits per counter (σ) lg(M/N
"... Abstract—Bloom filter and its variants have found widespread use in many networking applications. For these applications, minimizing storage cost is paramount as these filters often need to be implemented using scarce and costly (on-chip) SRAM. Besides supporting membership queries, Bloom filters ha ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract—Bloom filter and its variants have found widespread use in many networking applications. For these applications, minimizing storage cost is paramount as these filters often need to be implemented using scarce and costly (on-chip) SRAM. Besides supporting membership queries, Bloom filters have been generalized to support deletions and the encoding of information. Although a standard Bloom filter construction has proven to be extremely space-efficient, it is unnecessarily costly when generalized. Alternative constructions based on storing fingerprints in hash tables have been proposed that offer the same functionality as some Bloom filter variants, but using less space. In this paper, we propose a new fingerprint hash table construction called Rank-Indexed Hashing that can achieve very compact representations. A rank-indexed hashing construction that offers the same functionality as a counting Bloom filter can be achieved with a factor of three or more in space savings even for a false positive probability of just 1%. Even for a basic Bloom filter function that only supports membership queries, a rank-indexed hashing construction requires less space for a false positive probability as high as 0.1%, which is significant since a standard Bloom filter construction is widely regarded as extremely space-efficient for approximate membership problems. I.
Hash-Based Techniques for High-Speed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1
Flexible Lookup Modules for Rapid Deployment of New Protocols in High-speed Routers
"... New protocols for the data link and network layer are being proposed to address limitations of current protocols in terms of scalability, security, and manageability. High speed routers and switches that would need to implement these protocols traditionally perform packet processing using ASICs whic ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
New protocols for the data link and network layer are being proposed to address limitations of current protocols in terms of scalability, security, and manageability. High speed routers and switches that would need to implement these protocols traditionally perform packet processing using ASICs which offer high speed, low chip area, and low power. But with inflexible custom hardware, the deployment of new protocols could happen only through equipment upgrades. While newer routers use more flexible network processors for data plane processing, due to power and area constraints lookups in forwarding tables are done with custom lookup modules. Thus most of the proposed protocols can only be deployed with equipment upgrades. To speed up the deployment of new protocols, we propose a flexible lookup module, PLUG (Pipelined Lookup Grid). We can achieve generality without loosing efficiency because various custom lookup modules have the same fundamental features we retain: area dominated by memories, simple processing, and strict access patterns defined by the data structure. We implemented IPv4, Ethernet, Ethane and SEATTLE in our dataflow-based programming model for the PLUG and mapped them to the PLUG hardware which consists of a grid of tiles. The throughput, area, power and latency we achieve are close to those of specialized lookup modules.
Bloom Filters via d-left Hashing and Dynamic Bit Reassignment
- Proceedings of the Allerton Conference on Communication, Control and Computing
, 2006
"... Abstract — In recent work, the authors introduced a data structure with the same functionality as a counting Bloom filter (CBF) based on fingerprints and the d-left hashing technique. This paper describes dynamic bit reassignment, an approach that allows the size of the fingerprint to flexibly chang ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract — In recent work, the authors introduced a data structure with the same functionality as a counting Bloom filter (CBF) based on fingerprints and the d-left hashing technique. This paper describes dynamic bit reassignment, an approach that allows the size of the fingerprint to flexibly change with the load in each hash bucket, thereby reducing the probability of a false positive. This technique allows us to not only improve our d-left counting Bloom filter, but also to construct a data structure with the same functionality as a Bloom filter, including the ability to handle insertions online, that yields fewer false positives for sufficiently large filters. Our results show that our d-left Bloom filter data structure begins achieving smaller false positive rates than the standard construction at 16 bits per element. We explain the technique, describe why it is amenable to hardware implementation, and provide experimental results. I.
The Dynamic Bloom Filters
- In Proc. IEEE Infocom
, 2006
"... Abstract—A Bloom filter is an effective, space-efficient data structure for concisely representing a set and supporting approximate membership queries. Traditionally, the Bloom filter and its variants just focus on how to represent a static set and decrease the false positive probability to a suffic ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract—A Bloom filter is an effective, space-efficient data structure for concisely representing a set and supporting approximate membership queries. Traditionally, the Bloom filter and its variants just focus on how to represent a static set and decrease the false positive probability to a sufficiently low level. By investigating mainstream applications based on the Bloom filter, we reveal that dynamic data sets are more common and important than static sets. However, existing variants of the Bloom filter cannot support dynamic data sets well. To address this issue, we propose dynamic Bloom filters to represent dynamic sets as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms. The dynamic Bloom filter can control the false positive probability at a low level by expanding its capacity as the set cardinality increases. Through comprehensive mathematical analysis, we show that the dynamic Bloom filter uses less expected memory than the Bloom filter when representing dynamic sets with an upper bound on set cardinality, and also that the dynamic Bloom filter is more stable than the Bloom filter due to infrequent reconstruction when addressing dynamic sets without an upper bound on set cardinality. Moreover, the analysis results hold in standalone applications as well as distributed applications. Index Terms—Bloom filters, dynamic Bloom filters, information representation.
DRAM is plenty fast for wirespeed statistics counting
- In ACM HotMetrics
, 2008
"... Per-flow network measurement at Internet backbone links requires the efficient maintanence of large arrays of statistics counters at very high speeds (e.g. 40 Gb/s). The prevailing view is that SRAM is too expensive for implementing large counter arrays, but DRAM is too slow for providing wirespeed ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Per-flow network measurement at Internet backbone links requires the efficient maintanence of large arrays of statistics counters at very high speeds (e.g. 40 Gb/s). The prevailing view is that SRAM is too expensive for implementing large counter arrays, but DRAM is too slow for providing wirespeed updates. This view is the main premise of a number of hybrid SRAM/DRAM architectural proposals [2, 3, 4, 5] that still require substantial amounts of SRAM for large arrays. In this paper, we present a contrarian view that modern commodity DRAM architectures, driven by aggressive performance roadmaps for consumer applications (e.g. video games), have advanced architecture features that can be exploited to make DRAM solutions practical. We describe two such schemes that can harness the performance of these DRAM offerings by enabling the interleaving of counter updates to multiple memory banks. These counter schemes are the first to support arbitrary increments and decrements for either integer or floating point number representations at wirespeed. We believe our preliminary success with the use of DRAM schemes for wirespeed statistics counting opens the possibilities for broader research opportunities to generalize the proposed ideas for other network measurement functions.
The Variable-Increment Counting Bloom Filter
"... Abstract—Counting Bloom Filters (CBFs) are widely used in networking device algorithms. They implement fast set representations to support membership queries with limited error, and support element deletions unlike Bloom Filters. However, they also consume significant amounts of memory. In this pape ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract—Counting Bloom Filters (CBFs) are widely used in networking device algorithms. They implement fast set representations to support membership queries with limited error, and support element deletions unlike Bloom Filters. However, they also consume significant amounts of memory. In this paper we introduce a new general method based on variable increments to improve the efficiency of CBFs and their variants. Unlike CBFs, at each packet arrival, the hashed counters increase by a hashed variable increment instead of a unit increment. Then, to query a packet, the exact value of a counter is considered and not just its positiveness. We present two simple schemes based on this method. We demonstrate that this method can always achieve a lower false positive rate and a lower overflow probability bound than CBF in large systems. We also show how it can be easily implemented in hardware, with limited added complexity and memory overhead. We also explain how this method can extend many variants of CBF that have been published in the literature. Last, using simulations, we show how it can improve the false positive rate of CBFs by up to an order of magnitude given the same amount of memory.
Efficient and Robust TCP Stream Normalization
"... Network intrusion detection and prevention systems are vulnerable to evasion by attackers who craft ambiguous traffic to breach the defense of such systems. A normalizer is an inline network element that thwarts evasion attempts by removing ambiguities in network traffic. A particularly challenging ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Network intrusion detection and prevention systems are vulnerable to evasion by attackers who craft ambiguous traffic to breach the defense of such systems. A normalizer is an inline network element that thwarts evasion attempts by removing ambiguities in network traffic. A particularly challenging step in normalization is the sound detection of inconsistent TCP retransmissions, wherein an attacker sends TCP segments with different payloads for the same sequence number space to present a network monitor with ambiguous analysis. Normalizers that buffer all unacknowledged data to verify the consistency of subsequent retransmissions consume inordinate amounts of memory on highspeed links. On the other hand, normalizers that buffer only the hashes of unacknowledged segments cannot verify the consistency of 20–30 % of retransmissions that, according to our traces, do not align with the original transmissions. This paper presents the design of RoboNorm, a normalizer that buffers only the hashes of unacknowledged segments, and yet can detect all inconsistent retransmissions in any TCP byte stream. RoboNorm consumes 1–2 orders of magnitude less memory than normalizers that buffers all unacknowledged data, and is amenable to a high-speed implementation. RoboNorm is also robust to attacks that attempt to compromise its operation or exhaust its resources. 1.

