Results 11 -
19 of
19
On-line Reorganization of Sparsely-populated B+-trees
- In Proceedings of ACM/SIGMOD Annual Conference on Management of Data
, 1996
"... In this paper, we present an efficient method to do online reorganization of sparsely-populated B + -trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves may swap locations or be moved into empty ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper, we present an efficient method to do online reorganization of sparsely-populated B + -trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves may swap locations or be moved into empty pages so that they are in key order on the disk. After the leaves are reorganized, the method shrinks the tree by making a copy of the upper part of the tree while leaving the leaves in place. A new concurrency method is introduced so that only a minimum number of pages are locked during reorganization. During leaf reorganization, Forward Recovery is used to save all work already done while maintaining consistency after system crashes. A heuristic algorithm is developed to reduce the number of swaps needed during leaf reorganization, so that better concurrency and easier recovery can be achieved. A detailed description of switching from the old B + -tree to the new B + -tree is describe...
Efficient Bulk Deletes in Relational Databases
- In Proc. IEEE Conf. on Data Engineering
, 2001
"... Many applications require that large amounts of data are deleted from the database -- typically, such bulk deletes are carried out periodically and involve old or out-of-date data. If the data is not partitioned in such a way that bulk deletes can be carried out by simply deleting whole partitions, ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Many applications require that large amounts of data are deleted from the database -- typically, such bulk deletes are carried out periodically and involve old or out-of-date data. If the data is not partitioned in such a way that bulk deletes can be carried out by simply deleting whole partitions, then most current database products execute such bulk delete operations very poorly. The reason is that every record is deleted from each index individually. This paper proposes and evaluates a new class of techniques to support bulk delete operations more efficiently. These techniques outperform the "record-at-a-time" approach implemented in many database products by about one order of magnitude. 1. Introduction Sometimes we are confronted with more data than we can really use, and it might be wisest to forget and to destroy most of it. (Donald Knuth, The Art of Computer Programming) For many companies, the cost to administrate their databases has increased dramatically in the last coupl...
Efficiently Updating References During OnLine Reorganization
- VLDB'98, Proceedings of 24th International Conference on Very Large Data Bases
, 1996
"... With today’s demands for continuous avail-ability of mission-critical databases, on-line reorganization is a necessity. In this paper we present a new on-Iine reorganization algo-rithm which defers secondary index updates and piggybacks them with user transactions. In addition to the significant red ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
With today’s demands for continuous avail-ability of mission-critical databases, on-line reorganization is a necessity. In this paper we present a new on-Iine reorganization algo-rithm which defers secondary index updates and piggybacks them with user transactions. In addition to the significant reduction of the total I/O cost, the algorithm also assures that almost all the database is available all of the time and that the reorganization is interrupt-ible and restartable. We believe that the tech-nique presented in this paper could be used for improving normal database update perfor-mance as well. 1
On-Line Processing In Large-Scale Transaction Systems
, 1992
"... In this thesis, we provide techniques to adapt current database technology to account for the following trends that can be observed in database management system (DBMS) usage: 1. DBMSs are being increasingly used in applications, like computerized stock trading, that have very high transaction rates ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this thesis, we provide techniques to adapt current database technology to account for the following trends that can be observed in database management system (DBMS) usage: 1. DBMSs are being increasingly used in applications, like computerized stock trading, that have very high transaction rates. 2. Database sizes are growing rapidly, and future databases are expected to be several orders of magnitude larger than the largest databases in operation today. 3. Next generation DBMSs are expected to gravitate more and more towards what is referred to as 24(hour) \Theta 7(day) operation. In order to handle high transaction rates, future DBMSs have to use highly concurrent algorithms for managing often-used auxiliary data structures like indices. To better understand the performance of concurrency control algorithms for index access, we first compare the performance of B-tree concurrency control algorithms using a simulation model of a centralized DBMS. In our performance study, we look a...
Asynchronous View Maintenance for VLSD Databases
"... The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massiv ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query expressiveness of such systems by materializing more complex access paths and query results. In this paper, we examine mechanisms to implement indexes and views in a massive scale distributed database. For web applications, minimizing update latencies is critical, so we advocate deferring the work of maintaining views and indexes as much as possible. We examine the design space, and conclude that two types of view implementations, called remote view tables (RVTs) and local view tables (LVTs), provide a good tradeoff between system throughput and minimizing view staleness. We describe how to construct and maintain such view tables, and how they can be used to implement indexes, group-by-aggregate views, equijoin views and selection views. We also introduce and analyze a consistency model that makes it easier for application developers to cope with the impact of deferred view maintenance. An empirical evaluation quantifies the maintenance costs of our views, and shows that they can significantly improve the cost of evaluating complex queries.
Dynamic Hierarchical Data Clustering And Efficient On-Line Database Reorganization
, 1996
"... In recent years, as more applications start using massive databases as their main source of information, more emphasis is placed on the performance of the database system. These require not only that the database system have good performance, but also that it be continually available. The research i ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In recent years, as more applications start using massive databases as their main source of information, more emphasis is placed on the performance of the database system. These require not only that the database system have good performance, but also that it be continually available. The research in this thesis makes strides in meeting these requirements: dynamically clustering data improves the database performance, and efficient on-line reorganization methods enable the database systems to be continually available. An new algorithm, Enc, for dynamically clustering hierarchical data is presented in this thesis. It uses a primary B + -tree as the main storage structure, all relations in the hierarchy are stored in the B + -tree. The hierarchical relationship is encoded into the keys of the B + -tree. The Enc algorithm maintains good clustering in the presence of insertions and deletions. Experimental results show that using the Enc algorithm, hierarchical queries can be process...
Concurrency control for b-trees with differential indices
- In Proceedings of the International Database Engineering and Applications Symposium
, 2000
"... kerttu ( at) cs.hut.fi We present an indexing system where a database index is divided into two parts: the main index located on disk and the differential index in the main memory. Both indices are implemented as B-trees. All updates performed by active transactions are written in the differential i ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
kerttu ( at) cs.hut.fi We present an indexing system where a database index is divided into two parts: the main index located on disk and the differential index in the main memory. Both indices are implemented as B-trees. All updates performed by active transactions are written in the differential index. Periodically, writes of committed transactions are transferred from differential index to the main index as a batch-update operation. Thus, updates falling into the same leaf of the tree can be performed simultaneously. In addition, the system offers a simple recovering scheme. After a system crash, no undo operations are needed and redo operations need only write to the main memory. 1
Efficient Resumption of Interrupted Warehouse Loads
, 2000
"... Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicat ..."
Abstract
- Add to MetaCart
Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to "redo" the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of simple transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the basic properties of the transformations. We show that DR can lead to almost a ten-fold reduction in resumption time by performing experiments using commercial software to load TPC-D tables and materialized views. 1 Introduction ...
On-Line Reorganization: A Position Paper
"... Introduction On-line reorganization is and will be a major problem for transaction systems of the 1990s and the 2000s. Mainframes are rapidly being replaced by workstation farms. Most large software systems must eventually be ported to these new cheaper hardware architectures. The new hardware arch ..."
Abstract
- Add to MetaCart
Introduction On-line reorganization is and will be a major problem for transaction systems of the 1990s and the 2000s. Mainframes are rapidly being replaced by workstation farms. Most large software systems must eventually be ported to these new cheaper hardware architectures. The new hardware architectures lend themselves to a client-server software architecture. Presentation services are off-loaded to client workstations. Even servers are no longer on mainframes, but instead on collections of workstations, each perhaps responsible for only a part of the database. Application software will have to be rewritten to conform to the new hardware and software architectures. In addition, some companies may want to take advantage of newer object-oriented software and object-relational systems and new indexing and presentation options. They must then reorganize their data to make the best use of these new software systems. But at the same time, large companies such as airlines and ba

