Results 1 - 10
of
14
Boxwood: Abstractions as the Foundation for Storage Infrastructure
, 2004
"... Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper ..."
Abstract
-
Cited by 80 (8 self)
- Add to MetaCart
Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper explores the premise that high-level, fault-tolerant abstractions supported directly by the storage infrastructure can ameliorate these problems. We have built a system called Boxwood to explore the feasibility and utility of providing high-level abstractions or data structures as the fundamental storage infrastructure. Boxwood currently runs on a small cluster of eight machines. The Boxwood abstractions perform very close to the limits imposed by the processor, disk, and the native networking subsystem. Using these abstractions directly, we have implemented an NFSv2 file service that demonstrates the promise of our approach.
Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems
"... We describe computation migration, a new technique that is based on compile-time program transformations, for accessing remote data in a distributed-memory parallel system. In contrast with RPC-style access, where the access is performed remotely, and with data migration, where the data is moved so ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
We describe computation migration, a new technique that is based on compile-time program transformations, for accessing remote data in a distributed-memory parallel system. In contrast with RPC-style access, where the access is performed remotely, and with data migration, where the data is moved so that it is local, computation migration moves part of the current thread to the processor where the data resides. The access is performed at the remote processor, and the migrated thread portion continues to run on that same processor; this makes subsequent accesses in the thread portion local. We describe an implementation of computation migration that consists of two parts: an implementation that migrates single activation frames, and a high-level language annotation that allows a programmer to express when migration is desired. We performed experiments using two applications; these experiments demonstrate that computation migration is a valuable alternative to RPC and data migration.
Lazy Updates for Distributed Search Structures
- In SIGMOD
, 1993
"... Very large database systems require distributed storage, which means that they need distributed search structures for fast and efficient access to the data. In this paper, we present an approach to maintaining distributed data structures that uses lazy updates, which take advantage of the semantics ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
Very large database systems require distributed storage, which means that they need distributed search structures for fast and efficient access to the data. In this paper, we present an approach to maintaining distributed data structures that uses lazy updates, which take advantage of the semantics of the search structure operations to allow for scalable and low-overhead replication. Lazy updates can be used to design distributed search structures that support very high levels of concurrency. The alternatives to lazy update algorithms (vigorous updates) use synchronization to ensure consistency. Hence, lazy update algorithms are a distributed analogue of shared-memory lock-free search structure algorithms. Since lazy updates avoid the use of synchronization, they are much easier to implement than vigorous update algorithms. We demonstrate the application of lazy updates to the dB-tree, which is a distributed B + tree that replicates its interior nodes for highly parallel access. We d...
P-ring: An index structure for peer-to-peer systems
- In Cornell Technical Report
, 2004
"... Current peer-to-peer (P2P) index structures only support a subset of the desired functionality for P2P database systems. For instance, some P2P index structures support equality queries but not range queries, while others support range queries, but do not support multiple data items per peer or prov ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Current peer-to-peer (P2P) index structures only support a subset of the desired functionality for P2P database systems. For instance, some P2P index structures support equality queries but not range queries, while others support range queries, but do not support multiple data items per peer or provide guaranteed search performance. In this paper, we devise a novel index structure called P-Ring that supports both equality and range queries, is fault-tolerant, provides guaranteed search performance, and efficiently supports large sets of data items per peer. We are not aware of any other existing index structure that supports all of the above functionality in a dynamic P2P environment. In a thorough experimental study we evaluate the performance of P-Ring and quantify the performance trade-offs of the different system components. We also compare P-Ring with two other P2P index structures, Skip Graphs and Chord. 1.
A Practical Scalable Distributed B-Tree
"... Internet applications increasingly rely on scalable data structures that must support high throughput and store huge amounts of data. These data structures can be hard to implement efficiently. Recent proposals have overcome this problem by giving up on generality and implementing specialized interf ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Internet applications increasingly rely on scalable data structures that must support high throughput and store huge amounts of data. These data structures can be hard to implement efficiently. Recent proposals have overcome this problem by giving up on generality and implementing specialized interfaces and functionality (e.g., Dynamo [4]). We present the design of a more general and flexible solution: a fault-tolerant and scalable distributed B-tree. In addition to the usual B-tree operations, our B-tree provides some important practical features: transactions for atomically executing several operations in one or more B-trees, online migration of B-tree nodes between servers for load-balancing, and dynamic addition and removal of servers for supporting incremental growth of the system. Our design is conceptually simple. Rather than using complex concurrency and locking protocols, we use distributed transactions to make changes to B-tree nodes. We show how to extend the B-tree and keep additional information so that these transactions execute quickly and efficiently. Our design relies on an underlying distributed data sharing service, Sinfonia [1], which provides fault tolerance and a light-weight distributed atomic primitive. We use this primitive to commit our transactions. We implemented our B-tree and show that it performs comparably to an existing open-source B-tree and that it scales to hundreds of machines. We believe that our approach is general and can be used to implement other distributed data structures easily. 1.
Implementing Distributed Search Structures
, 1992
"... Distributed search structures are useful for parallel databases and in maintaining distributed storage systems. Although a considerable amount of research has been done on developing parallel search structures on shared-memory multiprocessors, little has been done on the development of search str ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Distributed search structures are useful for parallel databases and in maintaining distributed storage systems. Although a considerable amount of research has been done on developing parallel search structures on shared-memory multiprocessors, little has been done on the development of search structures for distributed-memory systems. In this paper we discuss some issues in the design and implementation of distributed B-trees, such as methods for low-overhead synchronization of tree restructuring and node mobility. One goal of this work is to implement a data-balanced dictionary which allows for balanced processor and space utilization. We present an algorithm for dynamic data-load balancing which uses node mobility mechanisms. We also study the effects that balancing and not balancing data have on the structure of a distributed B-tree. Finally, we demonstrate that our load-balancing algorithm distributes the nodes of a B-tree very well. Keywords: Data Structures, Distributed...
Relaxed Index Consistency for a Client-Server Database
- Intl. Conf. on Data Engineering
, 1996
"... Client-Server systems cache data in client buffers to deliver good performance. Several efficient protocols have been proposed to maintain the coherence of the cached data. However, none of the protocols distinguish between index pages and data pages. We propose a new coherence protocol, called Rela ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Client-Server systems cache data in client buffers to deliver good performance. Several efficient protocols have been proposed to maintain the coherence of the cached data. However, none of the protocols distinguish between index pages and data pages. We propose a new coherence protocol, called Relaxed Index Consistency, that exploits the inherent differences in the coherence and concurrency-control (C&CC) requirements for index and data pages. The key idea is to incur a small increase in computation time at the clients to gain a significant reduction in the number of messages exchanged between the clients and the servers. The protocol uses the concurrency control on data pages to maintain coherence of index pages. A performance-conscious implementation of the protocol that makes judicious use of version numbers is proposed. We show, through both qualitative and quantitative analysis, the performance benefits of making the distinction between index pages and data pages for the purpose...
Highly Scalable Data Balanced Distributed B-trees
, 1995
"... Scalable distributed search structures are needed to maintain large volumes of data and for parallel databases. In this paper, we analyze the performance of two large scale data-balanced distributed search structures, the dB-tree and the dE-tree. The dB-tree is a distributed B-tree that replicates i ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Scalable distributed search structures are needed to maintain large volumes of data and for parallel databases. In this paper, we analyze the performance of two large scale data-balanced distributed search structures, the dB-tree and the dE-tree. The dB-tree is a distributed B-tree that replicates its interior nodes. The dE-tree is a dB-tree in which leaf nodes represent key ranges, and thus requires far fewer nodes to represent a distributed index. The performance of both algorithms depends on the method by which tree nodes are assigned to processors (i.e., the algorithm for performing data balancing). We present a simulation study of data balancing algorithms for the dB-tree and the dE-tree. We find that a simple distributed data balancing algorithm works well for the dB-tree, requiring only a small space and message passing overhead. We compare three algorithms for data balancing in a dE-tree, and find that the most aggressive of the algorithms makes the dE-tree scalable. Using the ...
D.: Shepherdable indexes and persistent search services for mobile users
- In: 8th International Symposium on Distributed Objects and Applications (DOA 2006
, 2006
"... Abstract. We describe a range of designs for supporting rich search queries in a peer-to-peer network. Our implementation is based upon universally identified data objects which are replicated upon request by agents called Shepherds. Several abstract data structures are built upon this framework, su ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We describe a range of designs for supporting rich search queries in a peer-to-peer network. Our implementation is based upon universally identified data objects which are replicated upon request by agents called Shepherds. Several abstract data structures are built upon this framework, supporting dataset management, lexical search, and distributed GIS interfaces in an application called the Geobrowser. Our results demonstrate that it is possible to layer higher-level data structures upon a basic peer-to-peer transport and replication layer. When users perform a given query, parts of the index as well as the query results themselves are shepherded to the user’s local venue. A natural benefit of this approach is that mobile users can repeat previous searches if they become disconnected from the rest of the network. Some of the data structures that prove to be successful are peer-to-peer adaptations of traditional indexing structures. We review some of the properties that lead to successful designs in this domain, giving examples of deployed
Supporting Insertions and Deletions in Striped Parallel Filesystems
"... The dramatic improvements in the processing rates of parallel computers are turning many compute-bound jobs into IO-bound jobs. Parallel file systems have been proposed to better match IO throughput to processing power. Many parallel file systems stripe files across numerous disks; each disk has ..."
Abstract
- Add to MetaCart
The dramatic improvements in the processing rates of parallel computers are turning many compute-bound jobs into IO-bound jobs. Parallel file systems have been proposed to better match IO throughput to processing power. Many parallel file systems stripe files across numerous disks; each disk has its own controller. A striped file can be appended (or prepended) to and maintain its structure. However, a block can't be inserted into or deleted from the middle of the file, since doing so would destroy the regular striping structure of the file. In this paper, we present a distributed file structure that maintains files in indexed striped extents on a message passing multiprocessor. This approach allows highly parallel random and sequential reads, and also allows insertion and deletion into the middle of the file. 1 Introduction Researchers have observed that the performance of I/O subsystems has not kept pace with the increasing performance of the processors, especially in parall...

