Results 1 - 10
of
195
Network Applications of Bloom Filters: A Survey
- INTERNET MATHEMATICS
, 2002
"... A Bloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in ..."
Abstract
-
Cited by 522 (17 self)
- Add to MetaCart
A Bloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.
Trickle: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks
- In Proceedings of the First USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI
, 2004
"... We present Trickle, an algorithm for propagating and maintaining code updates in wireless sensor networks. Borrowing techniques from the epidemic/gossip, scalable multicast, and wireless broadcast literature, Trickle uses a "polite gossip" policy, where motes periodically broadcast a code ..."
Abstract
-
Cited by 376 (9 self)
- Add to MetaCart
(Show Context)
We present Trickle, an algorithm for propagating and maintaining code updates in wireless sensor networks. Borrowing techniques from the epidemic/gossip, scalable multicast, and wireless broadcast literature, Trickle uses a "polite gossip" policy, where motes periodically broadcast a code summary to local neighbors but stay quiet if they have recently heard a summary identical to theirs. When a mote hears an older summary than its own, it broadcasts an update. Instead of flooding a network with packets, the algorithm controls the send rate so each mote hears a small trickle of packets, just enough to stay up to date. We show that with this simple mechanism, Trickle can scale to thousand-fold changes in network density, propagate new code in the order of seconds, and impose a maintenance cost on the order of a few sends an hour.
The Bloomier filter: An efficient data structure for static support lookup tables
- in Proc. Symposium on Discrete Algorithms
, 2004
"... “Oh boy, here is another David Nelson” ..."
(Show Context)
Open Problems in Data-Sharing Peer-to-Peer Systems
- In ICDT 2003
, 2003
"... In a Peer-To-Peer (P2P) system, autonomous computers pool their resources (e.g., les, storage, compute cycles) in order to inexpensively handle tasks that would normally require large costly servers. The scale of these systems, their \open nature", and the lack of centralized control pose dicul ..."
Abstract
-
Cited by 77 (1 self)
- Add to MetaCart
(Show Context)
In a Peer-To-Peer (P2P) system, autonomous computers pool their resources (e.g., les, storage, compute cycles) in order to inexpensively handle tasks that would normally require large costly servers. The scale of these systems, their \open nature", and the lack of centralized control pose dicult performance and security challenges. Much research has recently focused on tackling some of these challenges
Hybrid Global-Local Indexing for Efficient Peer-To-Peer Information Retrieval
, 2004
"... Content-based full-text search still remains a particularly challenging problem in peer-to-peer (P2P) systems. Traditionally, there have been two index partitioning structures---partitioning based on the document space or partitioning based on keywords. The former requires search of every node in th ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
Content-based full-text search still remains a particularly challenging problem in peer-to-peer (P2P) systems. Traditionally, there have been two index partitioning structures---partitioning based on the document space or partitioning based on keywords. The former requires search of every node in the system to answer a query whereas the latter transmits a large amount of data when processing multi-term queries. In this paper, we propose eSearch---a P2P keyword search system based on a novel hybrid indexing structure. In eSearch, each node is responsible for certain terms. Given a document, eSearch uses a modern information retrieval algorithm to select a small number of top (important) terms in the document and publishes the complete term list for the document to nodes responsible for those top terms. This selective replication of term lists allows a multi-term query to proceed local to the nodes responsible for query terms. We also propose automatic query expansion to alleviate the degradation of quality of search results due to the selective replication, overlay source multicast to reduce the cost of disseminating term lists, and techniques to balance term list distribution across nodes.
Improving Collection Selection with Overlap Awareness in P2P Search Engines
- In SIGIR
, 2005
"... Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insuffici ..."
Abstract
-
Cited by 66 (23 self)
- Add to MetaCart
(Show Context)
Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA 1, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks
, 2005
"... Query processing in traditional information management systems has moved from an exact match model to more flexible paradigms allowing cooperative retrieval by aggregating the database objects' degree of match for each different query predicate and returning the best matching objects only. In p ..."
Abstract
-
Cited by 58 (10 self)
- Add to MetaCart
(Show Context)
Query processing in traditional information management systems has moved from an exact match model to more flexible paradigms allowing cooperative retrieval by aggregating the database objects' degree of match for each different query predicate and returning the best matching objects only. In peer-to-peer systems such strategies are even more important, given the potentially large number of peers, which may contribute to the results. Yet current peer-to-peer research has barely started to investigate such approaches. In this paper we will discuss the benefits of best match/top-k queries in the context of distributed peer-to-peer information infrastructures and show how to extend the limited query processing in current peer-to-peer networks by allowing the distributed processing of top-k queries, while maintaining a minimum of data traffic. Relying on a super-peer backbone organized in the HyperCuP topology we will show how to use local indexes for optimizing the necessary query routing and how to process intermediate results in inner network nodes at the earliest possible point in time cutting down the necessary data traffic within the network. Our algorithm is based on dynamically collected query statistics only, no continuous index update processes are necessary, allowing it to scale easily to large numbers of peers, as well as dynamic additions/deletions of peers. We will show our approach to always deliver correct result sets and to be optimal in terms of necessary object accesses and data traffic. Finally, we present simulation results for both static and dynamic network environments.
NeuroGrid: Semantically Routing Queries in Peer-to-Peer Networks
- In Proc. Intl. Workshop on Peer-to-Peer Computing
, 2002
"... NeuroGrid is an adaptive decentralized search system. NeuroGrid nodes support distributed search through semantic routing forwarding of queries based on content), and a learning mechanism that dynamically adjusts metadata describing the contents of nodes and the files that make up those contents. Ne ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
(Show Context)
NeuroGrid is an adaptive decentralized search system. NeuroGrid nodes support distributed search through semantic routing forwarding of queries based on content), and a learning mechanism that dynamically adjusts metadata describing the contents of nodes and the files that make up those contents. NeuroGrid is an open-source project, and prototype software has been made available at http://www.neurogrid.net/ NeuroGrid presents users with an alternative to hierarchical, folder-based file organization, and in the process offers an alternative approach to distributed search.
Theory and network applications of dynamic bloom filters
- In Proceedings of the 25th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM
, 2006
"... Abstract — A bloom filter is a simple, space-efficient, randomized data structure for concisely representing a static data set, in order to support approximate membership queries. It has great potential for distributed applications where systems need to share information about what resources they ha ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
(Show Context)
Abstract — A bloom filter is a simple, space-efficient, randomized data structure for concisely representing a static data set, in order to support approximate membership queries. It has great potential for distributed applications where systems need to share information about what resources they have. The space efficiency is achieved at the cost of a small probability of false positive in membership queries. However, for many applications the space savings and short locating time consistently outweigh this drawback. In this paper, we introduce dynamic bloom filters (DBF) to support concise representation and approximate membership queries of dynamic sets, and study the false positive probability and union algebra operations. We prove that DBF can control the false positive probability at a low level by adjusting the number of standard bloom filters used according to the actual size of current dynamic set. The space complexity is also acceptable if the actual size of dynamic set does not deviate too much from the predefined threshold. Furthermore, we present multidimension dynamic bloom filters (MDDBF) to support concise representation and approximate membership queries of dynamic sets in multiple attribute dimensions, and study the false positive probability and union algebra operations through mathematic analysis and experimentation. We also explore the optimization approach and three network applications of bloom filters, namely bloom joins, informed search, and global index implementation. Our simulation shows that informed search based on bloom filters can obtain higher recall and success rate of query than the blind search protocol.
Data discovery and dissemination with dip
- in Information Processing in Sensor Networks, 2008. IPSN ’08. International Conference on
, 2008
"... We present DIP, a data discovery and dissemination pro-tocol for wireless networks. Prior approaches, such as Trickle or SPIN, have overheads that scale linearly with the number of data items. For T items, DIP can identify new items withO(log(T)) packets while maintaining aO(1) de-tection latency. T ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
(Show Context)
We present DIP, a data discovery and dissemination pro-tocol for wireless networks. Prior approaches, such as Trickle or SPIN, have overheads that scale linearly with the number of data items. For T items, DIP can identify new items withO(log(T)) packets while maintaining aO(1) de-tection latency. To achieve this performance in a wide spec-trum of network configurations, DIP uses a hybrid approach of randomized scanning and tree-based directed searches. By dynamically selecting which of the two algorithms to use, DIP outperforms both in terms of transmissions and speed. Simulation and testbed experiments show that DIP sends 20-60 % fewer packets than existing protocols and can be 200 % faster, while only requiring O(log(log(T))) addi-tional state per data item. 1