Results 1 - 10
of
46
P-Grid: A self-organizing access structure for P2P information systems
- In CoopIS
, 2001
"... Peer-To-Peer systems are driving a major paradigm shift in the era of genuinely distributed computing. Gnutella is a good example of a Peer-To-Peer success story: a rather simple software enables Internet users to freely exchange files, such as MP3 music files. But it shows up also some of the limit ..."
Abstract
-
Cited by 225 (43 self)
- Add to MetaCart
Peer-To-Peer systems are driving a major paradigm shift in the era of genuinely distributed computing. Gnutella is a good example of a Peer-To-Peer success story: a rather simple software enables Internet users to freely exchange files, such as MP3 music files. But it shows up also some of the limitations of current P2P information systems with respect to their ability to manage data eÆciently. In this paper we introduce P-Grid, a scalable access structure that is specifically designed for Peer-To-Peer information systems. P-Grids are constructed and maintained by using randomized algorithms strictly based on local interactions, provide reliable data access even with unreliable peers, and scale gracefully both in storage and communication cost.
RP*: A Family of Order Preserving Scalable Distributed Data Structures
- In VLDB
, 1994
"... Hash-based scalable distributed data structures (SDDSs), like LH * and DDH, for networks of intcmonnected ampllters (multicomputers) were shown to open new perspectives for fik management. We prcpose a family of ordered SDDSs, called P, providing for ordered and dynamic files on mutticomputers, and ..."
Abstract
-
Cited by 82 (14 self)
- Add to MetaCart
Hash-based scalable distributed data structures (SDDSs), like LH * and DDH, for networks of intcmonnected ampllters (multicomputers) were shown to open new perspectives for fik management. We prcpose a family of ordered SDDSs, called P, providing for ordered and dynamic files on mutticomputers, and thus for more etlicknt pmeessing of range queries and of ordered traversak of files The basic algorithm termed RP*K builds the file with the same key space partitioning as a Etree, but avoids indexes through the use of multi&. The a&nithms, RP*c and RP*s enbance throughput for faster network adding the indexes on clients, or OII clients and sexve-rs, while eithe-r decmsing or avoiding multicast. Rpo files are shown highly efficient with access performance exceeding traditional files by an order of magnitude or two, an & for non-range queries very close to LH*. 1.
Replication under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution
- In Proceedings of the 18th International Parallel & Distributed Processing Symposium (IPDPS 2004), Santa Fe, NM
, 2004
"... Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Repl ..."
Abstract
-
Cited by 43 (13 self)
- Add to MetaCart
Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.
An Efficient Dynamic and Distributed Cryptographic Accumulator
- Tech. Rep., Johns Hopkins Information Security Institute
, 2002
"... We show how to use the RSA one-way accumulator to realize an efficient and dynamic authenticated dictionary, where untrusted directories provide cryptographically verifiable answers to membership queries on a set maintained by a trusted source. Our accumulator-based scheme for authenticated dicti ..."
Abstract
-
Cited by 34 (13 self)
- Add to MetaCart
We show how to use the RSA one-way accumulator to realize an efficient and dynamic authenticated dictionary, where untrusted directories provide cryptographically verifiable answers to membership queries on a set maintained by a trusted source. Our accumulator-based scheme for authenticated dictionaries supports efficient incremental updates of the underlying set by insertions and deletions of elements. Also, the user can optimally verify in constant time the authenticity of the answer provided by a directory with a simple and practical algorithm. This work has applications to certificate management in public key infrastructure and end-to-end integrity of data collections published by third parties on the Internet.
Replicated Indexes for Distributed Data
- In PDIS
, 1996
"... We describe a distributed index structure, in which data is distributed among multiple sites and indexes to the data are replicated over multiple sites. This permits good scalability as storage and accessing load are distributed over the sites and each site with an index replica has fast local acces ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
We describe a distributed index structure, in which data is distributed among multiple sites and indexes to the data are replicated over multiple sites. This permits good scalability as storage and accessing load are distributed over the sites and each site with an index replica has fast local access to the index structure, making remote requests at most for data at the leaves of the index tree. We call our method the dPi-tree because it is based on the Pi-tree. We replicate the index without the need for coherence messages. This works whether the index replica is persistent or a transient cached copy. We generalize a technique first used to provide recovery for Pi-tree indexes to independently and lazily maintain the index replicas. A further result is that each index replica is fully recoverable, an area not treated previously in replication schemes. We also show how the data in the leaves of the index can be distributed and re-distributed at very low cost. 1 Introduction 1.1 Scala...
A fast algorithm for online placement and reorganization of replicated data
- In Proceedings of the 17th International Parallel & Distributed Processing Symposium (IPDPS 2003
, 2003
"... As storage systems scale to thousands of disks, data distribution and load balancing become increasingly important. We present an algorithm for allocating data objects to disks as a system as it grows from a few disks to hundreds or thousands. A client using our algorithm can locate a data object in ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
As storage systems scale to thousands of disks, data distribution and load balancing become increasingly important. We present an algorithm for allocating data objects to disks as a system as it grows from a few disks to hundreds or thousands. A client using our algorithm can locate a data object in microseconds without consulting a central server or maintaining a full mapping of objects or buckets to disks. Despite requiring little global configuration data, our algorithm is probabilistically optimal in both distributing data evenly and minimizing data movement when new storage is added to the system. Moreover, our algorithm supports weighted allocation and variable levels of object replication, both of which are needed to permit systems to efficiently grow while accommodating new technology. 1
The Quest for Balancing Peer Load in Structured Peer-To-Peer Systems
, 2003
"... Structured peer-to-peer (P2P) systems are considered as the next generation application backbone on the Internet. An important problem of these systems is load balancing in the presence of non-uniform data distributions. In this paper we propose a completely decentralized mechanism that in parallel ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
Structured peer-to-peer (P2P) systems are considered as the next generation application backbone on the Internet. An important problem of these systems is load balancing in the presence of non-uniform data distributions. In this paper we propose a completely decentralized mechanism that in parallel addresses a local and a global load balancing problem: (1) balancing the storage load uniformly among peers participating in the network and (2) uniformly replicating different data items in the network while optimally exploiting existing storage capacity. Our approach is based on the P-Grid P2P system which is our variant of a structured P2P network. Problem (1) is solved by directly adapting the search structure to the data distribution. This may result in an unbalanced search structure, but we will show that the expected search cost in P-Grid in number of messages remains logarithmic under all circumstances.
Balanced Distributed Search Trees Do Not Exist
, 1995
"... This paper is a first step towards an understanding of the inherent limitations of distributed data structures. We propose a model of distributed search trees that is based on few natural assumptions. We prove that any class of trees within our model satisfies a lower bound of \Omega\Gamma p m) o ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
This paper is a first step towards an understanding of the inherent limitations of distributed data structures. We propose a model of distributed search trees that is based on few natural assumptions. We prove that any class of trees within our model satisfies a lower bound of \Omega\Gamma p m) on the worst case height of distributed search trees for m keys. That is, unlike in the single site case, balance in the sense that the tree height satisfies a logarithmic upper bound cannot be achieved. This is true although each node is allowed to have arbitrary degree (note that in this case, the height of a single site search tree is trivially bounded by one). By proposing a method that generates trees of height O( p m), we show the bound to be tight. 1 Introduction Distributed data structures have attracted considerable attention in the past few years. From a practical viewpoint, this is due to the increasing availability of networks of workstations. These networks offer an enormous c...
LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers
, 1995
"... LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, e ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, easily reaching Gbytes. Address calculus does not require any centralized component that could lead to a hot- spot. Access times to the le can be under a millisecond and the le can be used in parallel by several client processors. We showthe LH*lh design, and report on the performance analysis. This includes experiments on the Parsytec GC/PowerPlus multicomputer with up to 128 Power PCs and 32 MB of distributed RAM per node. We prove the e ciency of the method and justify various algorithmic choices that were made. LH*lh opens a new perspective for high-performance applications, especially for the database management of new types of data and in real-time environments.

