Results 1 - 10
of
10
Efficient Locking for Concurrent Operations on B-Trees
- ACM Transactions on Database Systems
, 1981
"... The B-tree and its variants have been found to be highly useful (both theoretically and in practice) for storing large amounts ofinformation, especially on secondary storage devices. We examine the problem of overcoming the inherent difficulty of concurrent operations on such structures, using a pra ..."
Abstract
-
Cited by 138 (0 self)
- Add to MetaCart
The B-tree and its variants have been found to be highly useful (both theoretically and in practice) for storing large amounts ofinformation, especially on secondary storage devices. We examine the problem of overcoming the inherent difficulty of concurrent operations on such structures, using a practical storage model. A single additional “link ” pointer in each node allows a process to easily recover from tree modifications performed by other concurrent processes. Our solution compares favorably with earlier solutions in that the locking scheme is simpler (no read-locks are used) and only a (small) constant number of nodes are locked by any update process at any given time. An informal correctness proof for our system is given,
Boxwood: Abstractions as the Foundation for Storage Infrastructure
, 2004
"... Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper ..."
Abstract
-
Cited by 80 (8 self)
- Add to MetaCart
Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper explores the premise that high-level, fault-tolerant abstractions supported directly by the storage infrastructure can ameliorate these problems. We have built a system called Boxwood to explore the feasibility and utility of providing high-level abstractions or data structures as the fundamental storage infrastructure. Boxwood currently runs on a small cluster of eight machines. The Boxwood abstractions perform very close to the limits imposed by the processor, disk, and the native networking subsystem. Using these abstractions directly, we have implemented an NFSv2 file service that demonstrates the promise of our approach.
B-trees with Inserts and Deletes: Why Free-at-empty is Better Than Merge-at-half
- Journal of Computer and System Sciences
, 1992
"... The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure i ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
The space utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts is 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from a mixture of insert and delete operations. Assuming that nodes only merge (i.e. are freed) when they are empty we show that the utilization is 39% when the number of inserts is the same as the number of deletes. However, if there are just 5% more inserts than deletes, then the utilization is over 62%. We also calculate the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We also model B-trees that merge half-empty nodes. The utilization of merge-at-half B-trees is slightly larger than the utilization of ...
The Performance of Concurrent Data Structure Algorithms
- Transactions on Database Systems
, 1994
"... This thesis develops a validated model of concurrent data structure algorithm performance, concentrating on concurrent B-trees. The thesis first develops two analytical tools, which are explained in the next two paragraphs, for the analysis. Yao showed that the space utilization of a B-tree built fr ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
This thesis develops a validated model of concurrent data structure algorithm performance, concentrating on concurrent B-trees. The thesis first develops two analytical tools, which are explained in the next two paragraphs, for the analysis. Yao showed that the space utilization of a B-tree built from random inserts is 69%. Assuming that nodes merge only when empty, we show that the utilization is 39% when the number of insert and delete operations is the same. However, if there are just 5% more inserts than deletes, then the utilization is at least 62%. In addition to the utilization, we calculate the probabilities of splitting and merging, important parameters for calculating concurrent B-tree algorithm performance. We compare merge-at-empty B-trees with merge-at-half B-trees. We conclude that merge-at-empty Btrees have a slightly lower space utilization but a much lower restructuring rate than merge-at-half B-trees, making merge-at-empty B-trees preferable for concurrent B-tree algo...
Implementing a generalized access path structure for a relational database system
- ACM Trans. Database Systems
, 1978
"... A new kind of implementation technique for access paths connecting sets of tuples qualified by attribute values is described. It combines the advantages of pointer chain and multilevel index implementation techniques. Compared to these structures the generalized access path structure is at least com ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A new kind of implementation technique for access paths connecting sets of tuples qualified by attribute values is described. It combines the advantages of pointer chain and multilevel index implementation techniques. Compared to these structures the generalized access path structure is at least competitive in performing retrieval and update operations, while a considerable storage space saving is gained. Some additional features of this structure support m-way joins and the evaluation of multirelation queries, and allow efficient checks of integrity assertions and simple reorganization schemes.
Amortization Results for Chromatic Search Trees, with an Application to Priority Queues
, 1997
"... this paper, we prove that only an amortized constant amount of rebalancing is necessary after an update in a chromatic search tree. We also prove that the amount of rebalancing done at any particular level decreases exponentially, going from the leaves toward the root. These results imply that, in p ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
this paper, we prove that only an amortized constant amount of rebalancing is necessary after an update in a chromatic search tree. We also prove that the amount of rebalancing done at any particular level decreases exponentially, going from the leaves toward the root. These results imply that, in principle, a linear number of processes can access the tree simultaneously. We have included one interesting application of chromatic trees. Based on these trees, a priority queue with possibilities for a greater degree of parallelism than previous proposals can be implemented. ] 1997 Academic Press 1.
On-line Reorganization of Sparsely-populated B+-trees
- In Proceedings of ACM/SIGMOD Annual Conference on Management of Data
, 1996
"... In this paper, we present an efficient method to do online reorganization of sparsely-populated B + -trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves may swap locations or be moved into empty ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper, we present an efficient method to do online reorganization of sparsely-populated B + -trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves may swap locations or be moved into empty pages so that they are in key order on the disk. After the leaves are reorganized, the method shrinks the tree by making a copy of the upper part of the tree while leaving the leaves in place. A new concurrency method is introduced so that only a minimum number of pages are locked during reorganization. During leaf reorganization, Forward Recovery is used to save all work already done while maintaining consistency after system crashes. A heuristic algorithm is developed to reduce the number of swaps needed during leaf reorganization, so that better concurrency and easier recovery can be achieved. A detailed description of switching from the old B + -tree to the new B + -tree is describe...
Dynamic Hierarchical Data Clustering And Efficient On-Line Database Reorganization
, 1996
"... In recent years, as more applications start using massive databases as their main source of information, more emphasis is placed on the performance of the database system. These require not only that the database system have good performance, but also that it be continually available. The research i ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In recent years, as more applications start using massive databases as their main source of information, more emphasis is placed on the performance of the database system. These require not only that the database system have good performance, but also that it be continually available. The research in this thesis makes strides in meeting these requirements: dynamically clustering data improves the database performance, and efficient on-line reorganization methods enable the database systems to be continually available. An new algorithm, Enc, for dynamically clustering hierarchical data is presented in this thesis. It uses a primary B + -tree as the main storage structure, all relations in the hierarchy are stored in the B + -tree. The hierarchical relationship is encoded into the keys of the B + -tree. The Enc algorithm maintains good clustering in the presence of insertions and deletions. Experimental results show that using the Enc algorithm, hierarchical queries can be process...
A Design Methodology for Data Warehouses
"... The objective of this work is to develop a design methodology for data warehouses. It is based on the three level modeling approach with emphasis on conceptual modeling. Logical design to the relational model and physical tuning in this environment will also be treated. 1 Research Question In r ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The objective of this work is to develop a design methodology for data warehouses. It is based on the three level modeling approach with emphasis on conceptual modeling. Logical design to the relational model and physical tuning in this environment will also be treated. 1 Research Question In recent years, data warehouses (DWs) [Inm92] as backbone of decision support systems caused a lively interest in research and practice. Typically, the DW is a database held separately from operational systems. Its data are integrated from the operational systems of an organization and often supplemented by data from external sources. The increasing popularity of DWs reflects the rising requirement to make strategic use of data integrated from heterogeneous sources. Some examples from economy for using the data stored in a DW are database marketing, controlling and (long--term) binding of customers. But there are also application scenarios outside the economical context, e. g. in medical regis...
INDEXING OF MULTIDIMENSIONAL DISCRETE DATA SPACES AND HYBRID EXTENSIONS By
"... In this thesis various indexing techniques are developed and evaluated to support efficient queries in different vector data spaces. Various indexing techniques have been introduced for the (ordered) Continuous Data Space (CDS) and the Non-ordered Discrete Data Space (NDDS). All these techniques rel ..."
Abstract
- Add to MetaCart
In this thesis various indexing techniques are developed and evaluated to support efficient queries in different vector data spaces. Various indexing techniques have been introduced for the (ordered) Continuous Data Space (CDS) and the Non-ordered Discrete Data Space (NDDS). All these techniques rely on special properties of the CDS or the NDDS to optimize data accesses and storage in their corresponding structures. Besides conventional exact match queries, the similarity queries and the box queries are two types of fundamental operations widely supported by modern indexing techniques. A box query is different from a similarity query in that the box query in multidimensional spaces tries to look up indexed data which meet query conditions on each and every dimension. The difference between similarity queries and box queriessuggests that indexing techniques which work well for similarity queries may not necessarily support efficient box queries. In this thesis, we propose the BoND-tree, a new indexing technique designed for supporting box queries in an NDDS. Both our theoretical analysis and experimental results demonstrate that the new heuristics proposed for the BoND-tree improve the performance of box queries in an NDDS significantly.

