Results 11 - 20
of
26
A Decentralized, Adaptive Replica Location Mechanism
- In Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11
, 2002
"... We describe a decentralized, adaptive mechanism for replica location in wide-area distributed systems. Unlike traditional, hierarchical (e.g, DNS) and more recent (e.g., CAN, Chord, Gnutella) distributed search and indexing schemes, nodes in our location mechanism do not route queries, instead, they ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
We describe a decentralized, adaptive mechanism for replica location in wide-area distributed systems. Unlike traditional, hierarchical (e.g, DNS) and more recent (e.g., CAN, Chord, Gnutella) distributed search and indexing schemes, nodes in our location mechanism do not route queries, instead, they organize into an overlay network and distribute location information. We contend that this approach works well in environments where replica location queries are prevalent but the dynamic component of the system (e.g., node and network failures, replica add/delete operations) cannot be neglected. We argue that a replica location mechanism that combines probabilistic representations of replica location information with soft-state protocols and a flat overlay network of nodes brings important benefits: genuine decentralization, low query latency, and flexibility to introduce adaptive communication schedules. We support these claims in two ways. First, we provide a rough resource consumption evaluation: we show that, for environments similar to those encountered in large scientific data analysis projects, generated network traffic is limited and, more importantly, is comparable to the traffic generated by a request routing scheme. Second, we provide encouraging performance data from a prototype implementation. 1.
Performance in Practice of String Hashing Functions
- Proc. Int. Conf. on Database Systems for Advanced Applications
, 1997
"... String hashing is a fundamental operation, used in countless applications where fast access to distinct strings is required. In this paper we describe a class of string hashing functions and explore its performance. In particular, using experiments with both small sets of keys and a large key set fr ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
String hashing is a fundamental operation, used in countless applications where fast access to distinct strings is required. In this paper we describe a class of string hashing functions and explore its performance. In particular, using experiments with both small sets of keys and a large key set from a text database, we show that it is possible to achieve performance close to that theoretically predicted for hashing functions. We also consider criteria for choosing a hashing function and use them to compare our class of functions to other methods for string hashing. These results show that our class of hashing functions is reliable and efficient, and is therefore an appropriate choice for general-purpose hashing.
Bloom-Based Filters for Hierarchical Data
- 5th Workshop on Distributed Data Structures and Algorithms (WDAS ’03), Thessaloniki
, 2003
"... In this paper, we present two novel hash-based indexing structures, based on Bloom filters, called Breadth and Depth Bloom filters, which in contrast to traditional hash-based indexes, are able to summarize hierarchical data and support regular path expression queries. We describe how these structur ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
In this paper, we present two novel hash-based indexing structures, based on Bloom filters, called Breadth and Depth Bloom filters, which in contrast to traditional hash-based indexes, are able to summarize hierarchical data and support regular path expression queries. We describe how these structures can be used for resource discovery in peer-to-peer networks. We have implemented both structures and our experiments show that they both outperform Simple Bloom filters in discovering the appropriate resources. 1.
Simple Summaries for Hashing with Multiple Choices
"... In a multiple-choice hashing scheme, each item is stored in one of d> = 2 possible hash tablebuckets. The availability of these multiple choices allows for a substantial reduction in the maximum load of the buckets. However, a lookup may now require examining each of the d locations. Forapplication ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In a multiple-choice hashing scheme, each item is stored in one of d> = 2 possible hash tablebuckets. The availability of these multiple choices allows for a substantial reduction in the maximum load of the buckets. However, a lookup may now require examining each of the d locations. Forapplications where this cost is undesirable, Song et al. propose keeping a summary that allows one to determine which of the d locations is appropriate for each item, where the summary may allowfalse positives for items not in hash table. We propose alternative, simple constructions of such summaries that use less space for both the summary and the underlying hash table. Moreover, ourconstructions are easily analyzable and tunable.
Content-based overlay networks of XML peers based on multi-level bloom filters
- Proceedings of VLDB International Workshop on Databases, Information Systems and Peer-to-Peer Computing
, 2003
"... Abstract. Peer-to-peer systems are gaining popularity as a means to effectively share huge, massively distributed data collections. In this paper, we consider XML peers, that is, peers that store XML documents. We show how an extension of traditional Bloom filters, called multi-level Bloom filters, ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract. Peer-to-peer systems are gaining popularity as a means to effectively share huge, massively distributed data collections. In this paper, we consider XML peers, that is, peers that store XML documents. We show how an extension of traditional Bloom filters, called multi-level Bloom filters, can be used to route path queries in such a system. In addition, we propose building content-based overlay networks by linking together peers with similar content. The similarity of the content (i.e., the local documents) of two peers is defined based on the similarity of their filters. Our experimental results show that overlay networks built based on filter similarity are very effective in retrieving a large number of relevant documents, since peers with similar content tend to be clustered together. 1
Building a better Bloom filter
- In Proceedings of the 14th Annual European Symposium on Algorithms (ESA
, 2005
"... A technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functio ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
A technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice. 1
On the false-positive rate of Bloom filters
- REPORT, SCHOOL OF COMP. SCI., CARLETON UNIV., 2007.HTTP://CG.SCS.CARLETON.CA/ ¼ MORIN/PUBLICATIONS/DS/ BLOOM-SUBMITTED.PDF
, 2007
"... Abstract. Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the false-positive rate, and Bloom's analysis has ap ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the false-positive rate, and Bloom's analysis has appeared in many publications throughout the years. We show that Bloom's analysis is incorrect and give a correct analysis.
Hash-Based Techniques for High-Speed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1
Filters for XML-based Service Discovery in Pervasive Computing
- Computer Journal: Special Issue on Mobile and Pervasive Computing
"... Pervasive computing refers to an emerging trend towards numerous casually accessible devices connected to an increasingly ubiquitous network infrastructure. An important challenge in this context is discovering the appropriate data and services. In this paper, we assume that services and data are ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Pervasive computing refers to an emerging trend towards numerous casually accessible devices connected to an increasingly ubiquitous network infrastructure. An important challenge in this context is discovering the appropriate data and services. In this paper, we assume that services and data are described using hierarchically structured metadata. There is no centralized index for the services; instead, appropriately distributed filters are used to route queries to the appropriate nodes. We propose two new types of filters that extend Bloom filters for hierarchical documents.
Dynamic count filters
, 2005
"... Bloom filters are not able to handle deletes and inserts on multisets over time. This is important in many situations when streamed data evolve rapidly and change patterns frequently. Counting Bloom Filters (CBF) have been proposed to overcome this limitation and allow for the dynamic evolution of B ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Bloom filters are not able to handle deletes and inserts on multisets over time. This is important in many situations when streamed data evolve rapidly and change patterns frequently. Counting Bloom Filters (CBF) have been proposed to overcome this limitation and allow for the dynamic evolution of Bloom filters. The only dynamic approach to a compact and efficient representation of CBF are the Spectral Bloom Filters (SBF). In this paper we propose the Dynamic Count Filters (DCF) as a new dynamic and space-time efficient representation of CBF. Although DCF does not make a compact use of memory, it shows to be faster and more space efficient than any previous proposal. Results show that the proposed data structure is more efficient independently of the incoming data characteristics. 1.

