Results 1 - 10
of
106
Network Applications of Bloom Filters: A Survey
- Internet Mathematics
, 2002
"... Abstract. ABloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been u ..."
Abstract
-
Cited by 257 (12 self)
- Add to MetaCart
Abstract. ABloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications. 1.
Informed content delivery across adaptive overlay networks
- In Proceedings of ACM SIGCOMM
, 2002
"... Abstract—Overlay networks have emerged as a powerful and highly flexible method for delivering content. We study how to optimize throughput of large transfers across richly connected, adaptive overlay networks, focusing on the potential of collaborative transfers between peers to supplement ongoing ..."
Abstract
-
Cited by 179 (9 self)
- Add to MetaCart
Abstract—Overlay networks have emerged as a powerful and highly flexible method for delivering content. We study how to optimize throughput of large transfers across richly connected, adaptive overlay networks, focusing on the potential of collaborative transfers between peers to supplement ongoing downloads. First, we make the case for an erasure-resilient encoding of the content. Using the digital fountain encoding approach, end hosts can efficiently reconstruct the original content of size from a subset of any symbols drawn from a large universe of encoding symbols. Such an approach affords reliability and a substantial degree of application-level flexibility, as it seamlessly accommodates connection migration and parallel transfers while providing resilience to packet loss. However, since the sets of encoding symbols acquired by peers during downloads may overlap substantially, care must be taken to enable them to collaborate effectively. Our main contribution is a collection of useful algorithmic tools for efficient summarization and approximate reconciliation of sets of symbols between pairs of collaborating peers, all of which keep message complexity and computation to a minimum. Through simulations and experiments on a prototype implementation, we demonstrate the performance benefits of our informed content-delivery mechanisms and how they complement existing overlay network architectures. Index Terms—Bloom filter, content delivery, digital fountain, erasure code, min-wise sketch, overlay, peer-to-peer, reconciliation. I.
On the Feasibility of Peer-to-Peer Web Indexing and Search
- IN IPTPS’03
, 2003
"... This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We pr ..."
Abstract
-
Cited by 121 (11 self)
- Add to MetaCart
This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.
Enhancing P2P File-Sharing with an Internet-Scale Query Processor
, 2004
"... In this paper, we address the problem of designing a scalable, accurate query processor for peerto -peer filesharing and similar distributed keyword search systems. Using a globally-distributed monitoring infrastructure, we perform an extensive study of the Gnutella filesharing network, charact ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
In this paper, we address the problem of designing a scalable, accurate query processor for peerto -peer filesharing and similar distributed keyword search systems. Using a globally-distributed monitoring infrastructure, we perform an extensive study of the Gnutella filesharing network, characterizing its topology, data and query workloads. We observe
Content-based routing of path queries in peer-to-peer systems
- In EDBT
, 2004
"... Peer-to-peer systems are gaining popularity as a means to effectively share huge, massively distributed data collections. An important challenge in this context is discovering the appropriate data and services. In this paper, we consider peers that store XML documents. We show how an extension of tr ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
Peer-to-peer systems are gaining popularity as a means to effectively share huge, massively distributed data collections. An important challenge in this context is discovering the appropriate data and services. In this paper, we consider peers that store XML documents. We show how an extension of traditional Bloom filters, called multi-level Bloom filters, can be used to route path queries in such a system. Two alternative ways are considered for building overlay networks of peers: one based on network proximity and one based on content similarity. Content similarity is derived from the similarity among filters. Our experimental results show that networks based on content similarity outperform those formed based on network proximity for finding all matching documents. 1.
Efficient algorithms for large-scale topology discovery
- in Proc. ACM SIGMETRICS
, 2005
"... There is a growing interest in discovery of internet topology at the interface level. A new generation of highly distributed measurement systems is currently being deployed. Unfortunately, the research community has not examined the problem of how to perform such measurements efficiently and in a ne ..."
Abstract
-
Cited by 54 (15 self)
- Add to MetaCart
There is a growing interest in discovery of internet topology at the interface level. A new generation of highly distributed measurement systems is currently being deployed. Unfortunately, the research community has not examined the problem of how to perform such measurements efficiently and in a network-friendly manner. In this paper we make two contributions toward that end. First, we show that standard topology discovery methods (e.g., skitter) are quite inefficient, repeatedly probing the same interfaces. This is a concern, because when scaled up, such methods will generate so much traffic that they will begin to resemble DDoS attacks. We measure two kinds of redundancy in probing (intra- and inter-monitor) and show that both kinds are important. We show that straightforward approaches to addressing these two kinds of redundancy must take opposite tacks, and are thus fundamentally in conflict. Our second contribution is to propose and evaluate Doubletree, an algorithm that reduces both types of redundancy simultaneously on routers and end systems. The key ideas are to exploit the treelike structure of routes to and from a single point in order to guide when to stop probing, and to probe each path by starting near its midpoint. Our results show that Doubletree can reduce both types of measurement load on the network dramatically, while permitting discovery of nearly the same set of nodes and links. ∗ The authors are participants in the traceroute@home project.This work was supported by: the RNRT project
The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables
- In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA
, 2004
"... We introduce the Bloomier filter, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries. Our construction generalizes the classical Bloom filter, an ingenious hashing scheme heavily used in networks and databases, whose main attribu ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
We introduce the Bloomier filter, a data structure for compactly encoding a function with static support in order to support approximate evaluation queries. Our construction generalizes the classical Bloom filter, an ingenious hashing scheme heavily used in networks and databases, whose main attribute -- space efficiency -- is achieved at the expense of a tiny false-positive rate. Whereas Bloom filters can handle only set membership queries, our Bloomier filters can deal with arbitrary functions. We give several designs varying in simplicity and optimality, and we provide lower bounds to prove the (near) optimality of our constructions.
Improving Collection Selection with Overlap Awareness in P2P Search Engines
- In SIGIR
, 2005
"... Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insuffici ..."
Abstract
-
Cited by 45 (19 self)
- Add to MetaCart
Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA 1, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.
Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations
, 2002
"... this paper we advocate the benefits of exploiting emergent patterns in self-configuring networks specialized for scientific data-sharing collaborations. We speculate that a P2P scientific collaboration network will exhibit small-world topology, as do a large number of social networks for which the s ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
this paper we advocate the benefits of exploiting emergent patterns in self-configuring networks specialized for scientific data-sharing collaborations. We speculate that a P2P scientific collaboration network will exhibit small-world topology, as do a large number of social networks for which the same pattern has been documented
Make it Fresh, Make it Quick - Searching a Network of Personal Webservers
- In Proc. 12th International World Wide Web Conference
, 2003
"... Personal webservers have proven to be a popular means of sharing files and peer collaboration. Unfortunately, the transient availability and rapidly evolving content on such hosts render centralized, crawl-based search indices stale and incomplete. To address this problem, we propose YouSearch, a di ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Personal webservers have proven to be a popular means of sharing files and peer collaboration. Unfortunately, the transient availability and rapidly evolving content on such hosts render centralized, crawl-based search indices stale and incomplete. To address this problem, we propose YouSearch, a distributed search application for personal webservers operating within a shared context (e.g., a corporate intranet). With YouSearch, search results are always fast, fresh and complete --- properties we show arise from an architecture that exploits both the extensive distributed resources available at the peer webservers in addition to a centralized repository of summarized network state. YouSearch extends the concept of a shared context within web communities by enabling peers to aggregate into groups and users to search over specific groups. In this paper, we describe the challenges, design, implementation and experiences with a successful intranet deployment of YouSearch.

