Results 1 -
6 of
6
DHTs over Peer Clusters for Distributed Information Retrieval
"... Distributed Hash Tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Distributed Hash Tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However,
A combination of DHTs and peer clustering for distributed information retrieval
, 2007
"... Distributed Hash Tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Distributed Hash Tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However,
PCIR: Combining DHTs and Peer Clusters for Efficient Full-text P2P Indexing
"... Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge ..."
Abstract
- Add to MetaCart
Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing. 1.
Nonnumerical Algorithms and Problems—computations
"... A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems that ease the burden of parallel programming. However, many of these systems intersect the Bloom filter bit-arrays to approximate address-set intersection and ..."
Abstract
- Add to MetaCart
A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems that ease the burden of parallel programming. However, many of these systems intersect the Bloom filter bit-arrays to approximate address-set intersection and decide set disjointness. This is in contrast with the conventional and well-studied approach of making individual membership queries into the Bloom filter. In this paper we present much-needed probabilistic models for the unconventional application of testing set disjointness using Bloom filters. Consequently, we demonstrate that intersecting Bloom filters requires substantially larger bit-arrays to provide the same probability of false set-overlap as querying into the bit-array. For when intersection is unavoidable, we prove that partitioned Bloom filters require less space than unpartitioned. Finally, we show that for Bloom filters with a single hash function, surprisingly, intersection and querying share the same probability of false set-overlap.
Designing a Parallel Query . . .
, 2010
"... Map/Reduce is a parallel programming model introduced by Google Inc., which enables the easy parallelization of tasks while hiding the details and complexity of parallel computation. This report presents the design of a parallel query engine over Map/Reduce. This is achieved in two parts. First, we ..."
Abstract
- Add to MetaCart
Map/Reduce is a parallel programming model introduced by Google Inc., which enables the easy parallelization of tasks while hiding the details and complexity of parallel computation. This report presents the design of a parallel query engine over Map/Reduce. This is achieved in two parts. First, we examine algorithms for performing equi-joins between datasets over Map/Reduce and we provide a comparative analysis. Second, we design a cost model for estimating the performance of each algorithm. This is considered as one of the keystones for building an optimizer capable of choosing the appropriate algorithm for each case. Our results indicate that all join algorithms are significantly affected by certain properties of the input datasets (size, selectivity factor, etc) and that each algorithm performs better under certain circumstances. Our cost model manages to capture these factors and estimates fairly accurately the performance of each algorithm.

