Results 1 - 10
of
13
Write-Optimized B-Trees
, 2004
"... Large writes are beneficial both on individual disks and on disk arrays, e.g., RAID-5. The presented design enables large writes of internal B-tree nodes and leaves. It supports both in-place updates and large append-only (“log-structured”) write operations within the same storage volume, within the ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Large writes are beneficial both on individual disks and on disk arrays, e.g., RAID-5. The presented design enables large writes of internal B-tree nodes and leaves. It supports both in-place updates and large append-only (“log-structured”) write operations within the same storage volume, within the same B-tree, and even at the same time. The essence of the proposal is to make page migration inexpensive, to migrate pages while writing them, and to make such migration optional rather than mandatory as in log-structured file systems. The inexpensive page migration also aids traditional defragmentation as well as consolidation of free space needed for future large writes. These advantages are achieved with a very limited modification to conventional B-trees that also simplifies other B-tree operations, e.g., key range locking and compression. Prior proposals and prototypes implemented transacted B-tree on top of log-structured file systems and added transaction support to log-structured file systems. Instead, the presented design adds techniques and performance characteristics of log-structured file systems to traditional B-trees and their standard transaction support, notably without adding a layer of indirection for locating B-tree nodes on disk. The result retains fine-granularity locking, full transactional ACID guarantees, fast search performance, etc. expected of a modern B-tree implementation, yet adds efficient transacted page relocation and large, high-bandwidth writes. 1
Multi-Tenant Databases for Software as a Service: Schema-Mapping Techniques
"... In the implementation of hosted business services, multiple tenants are often consolidated into the same database to reduce total cost of ownership. Common practice is to map multiple single-tenant logical schemas in the application to one multi-tenant physical schema in the database. Such mappings ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
In the implementation of hosted business services, multiple tenants are often consolidated into the same database to reduce total cost of ownership. Common practice is to map multiple single-tenant logical schemas in the application to one multi-tenant physical schema in the database. Such mappings are challenging to create because enterprise applications allow tenants to extend the base schema, e.g., for vertical industries or geographic regions. Assuming the workload stays within bounds, the fundamental limitation on scalability for this approach is the number of tables the database can handle. To get good consolidation, certain tables must be shared among tenants and certain tables must be mapped into fixed generic structures such as Universal and Pivot Tables, which can degrade performance. This paper describes a new schema-mapping technique for multi-tenancy called Chunk Folding, where the logical tables are vertically partitioned into chunks that are folded together into different physical multi-tenant tables and joined as needed. The database’s “meta-data budget ” is divided between application-specific conventional tables and a large fixed set of generic structures called Chunk Tables. Good performance is obtained by mapping the most heavily-utilized parts of the logical schemas into the conventional tables and the remaining parts into Chunk Tables that match their structure as closely as possible. We present the results of several experiments designed to measure the efficacy of Chunk Folding and describe the multi-tenant database testbed in which these experiments were performed.
Implementing Sorting in Database Systems
- ACM Comput. Surv
, 2006
"... Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in mul ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in multiuser operations. This survey collects many of these techniques for easy reference by students, researchers, and product developers. It covers in-memory sorting, disk-based external sorting, and considerations that apply specifically to sorting in database systems.
A SQL:1999 Code Generator for the Pathfinder XQuery Compiler (Demo Paper
- In Proc. SIGMOD
, 2007
"... The Pathfinder XQuery compiler has been enhanced by a new code generator that can target any SQL:1999-compliant relational database system (RDBMS). This code generator marks an important next step towards truly relational XQuery processing, a branch of database technology that aims to turn RDBMSs in ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
The Pathfinder XQuery compiler has been enhanced by a new code generator that can target any SQL:1999-compliant relational database system (RDBMS). This code generator marks an important next step towards truly relational XQuery processing, a branch of database technology that aims to turn RDBMSs into highly efficient XML and XQuery processors without the need to invade the relational database kernel. Pathfinder, a retargetable front-end compiler, translates input XQuery expressions into DAG-shaped relational algebra plans. The code generator then turns these plans into sequences of either SQL:1999 statements or view definitions which jointly implement the (sometimes intricate) XQuery semantics. In a sense, this demonstration thus lets relational algebra and SQL swap their traditional roles in database query processing. The result is a code generator that (1) supports an almost complete dialect of XQuery, (2) can target any RDBMS with a SQL:1999 language interface, and (3) exhibits quite promising performance characteristics when run against high-volume XML data as well as complex XQuery expressions.
Cracking the Database Store
- In CIDR
, 2005
"... Query performance strongly depends on finding an execution plan that touches as few superfluous tuples as possible. The access structures deployed for this purpose, however, are non-discriminative. They assume every subset of the domain being indexed is equally important, and their structures cause ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Query performance strongly depends on finding an execution plan that touches as few superfluous tuples as possible. The access structures deployed for this purpose, however, are non-discriminative. They assume every subset of the domain being indexed is equally important, and their structures cause a high maintenance overhead during updates. This approach often fails in decision support or scientific environments where index selection represents a weak compromise amongst many plausible plans. An alternative route, explored here, is to continuously adapt the database organization by making reorganization an integral part of the query evaluation process. Every query is first analyzed for its contribution to break the database into multiple pieces, such that both the required subset is easily retrieved and subsequent queries may benefit from the new partitioning structure. To study the potentials for this approach, we developed a small representative multi-query benchmark and ran experiments against several open-source DBMSs. The results obtained are indicative for a significant reduction in system complexity with clear performance benefits. 1
Why Off-the-Shelf RDBMSs are Better at XPath Than You Might Expect
- In Proc. SIGMOD
, 2007
"... To compensate for the inherent impedance mismatch between the relational data model (tables of tuples) and XML (ordered, unranked trees), tree join algorithms have become the prevalent means to process XML data in relational databases, most notably the TwigStack [6], structural join [1], and stairca ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
To compensate for the inherent impedance mismatch between the relational data model (tables of tuples) and XML (ordered, unranked trees), tree join algorithms have become the prevalent means to process XML data in relational databases, most notably the TwigStack [6], structural join [1], and staircase join [13] algorithms. However, the addition of these algorithms to existing systems depends on a significant invasion of the underlying database kernel, an option intolerable for most database vendors. Here, we demonstrate that we can achieve comparable XPath performance without touching the heart of the system. We carefully exploit existing database functionality and accelerate XPath navigation by purely relational means: partitioned B-trees bring access costs to secondary storage to a minimum, while aggregation functions avoid an expensive computation and removal of duplicate result nodes to comply with the XPath semantics. Experiments carried out on IBM DB2 confirm that our approach can turn off-the-shelf database systems into efficient XPath processors.
ABSTRACT Rose: Compressed, log-structured replication
"... Rose 1 is a database storage engine for high-throughput replication. It targets seek-limited, write-intensive transaction processing workloads that perform near real-time decision support and analytical processing queries. Rose uses log structured merge (LSM) trees to create full database replicas u ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Rose 1 is a database storage engine for high-throughput replication. It targets seek-limited, write-intensive transaction processing workloads that perform near real-time decision support and analytical processing queries. Rose uses log structured merge (LSM) trees to create full database replicas using purely sequential I/O, allowing it to provide orders of magnitude more write throughput than B-tree based replicas. Also, LSM-trees cannot become fragmented and provide fast, predictable index scans. Rose’s write performance relies on replicas ’ ability to perform writes without looking up old values. LSM-tree lookups have performance comparable to B-tree lookups. If Rose read each value that it updated then its write throughput would also be comparable to a B-tree. Although we target replication, Rose provides high write throughput to any application
Partitioned B-trees -- a user's guide
, 2003
"... A recent article introduced partitioned B-trees, in which partitions are defined not in the catalogs but by distinct values in an artificial leading key column. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A recent article introduced partitioned B-trees, in which partitions are defined not in the catalogs but by distinct values in an artificial leading key column.
MaSM: Efficient online updates in data warehouses
- In SIGMOD
, 2011
"... Data warehouses have been traditionally optimized for read-only query performance, allowing only offline updates at night, essentially trading off data freshness for performance. The need for 24x7 operations in global markets and the rise of online and other quickly-reacting businesses make concurre ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Data warehouses have been traditionally optimized for read-only query performance, allowing only offline updates at night, essentially trading off data freshness for performance. The need for 24x7 operations in global markets and the rise of online and other quickly-reacting businesses make concurrent online updates increasingly desirable. Unfortunately, state-of-the-art approaches fall short of supporting fast analysis queries over fresh data. The conventional approach of performing updates in place can dramatically slow down query performance, while prior proposals using differential updates either require large in-memory buffers or may incur significant update migration cost. This paper presents a novel approach for supporting online updates in data warehouses that overcomes the limitations of prior approaches, by making judicious use of available SSDs to cache incoming updates. We model the problem of query processing with differential updates as a type of outer join between the data residing on disks and the updates residing on SSDs. We present MaSM algorithms for performing such joins and periodic migrations, with small memory footprints, low query overhead, low SSD writes, efficient in-place migration of updates, and correct ACID support. Our experiments show that MaSM incurs only up to 7 % overhead both on synthetic range scans (varying range size from 100GB to 4KB) and in a TPC-H query replay study, while also increasing the update throughput by orders of magnitude.
A Batch of PNUTS: Experiences Connecting Cloud Batch and Serving Systems
- In Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD
, 2011
"... Cloud data management systems are growing in prominence, particularly at large Internet companies like Google, Yahoo!, and Amazon, which prize them for their scalability and elasticity. Each of these systems trades off between low-latency serving performance and batch processing throughput. In this ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Cloud data management systems are growing in prominence, particularly at large Internet companies like Google, Yahoo!, and Amazon, which prize them for their scalability and elasticity. Each of these systems trades off between low-latency serving performance and batch processing throughput. In this paper, we discuss our experience running batch-oriented Hadoop on top of Yahoo!’s serving-oriented PNUTS system instead of the standard HDFS file system. Though PNUTS is optimized for and primarily used for serving, a number of applications at Yahoo! must run batch-oriented jobs that read or write data that is stored in PNUTS. Combining these systems reveals several key areas where the fundamental properties of each system are mismatched. We discuss our approaches to accommodating these mismatches, by either bending the batch and serving abstractions, or inventing new ones. Batch systems like Hadoop provide coarse task-level recovery, while serving systems like PNUTS provide finer record or transaction-level recovery. We combine both types to log record-level errors, while detecting and recovering from large-scale errors. Batch systems optimize for read and write throughput of large requests, while serving systems use indexing to provide low latency access to individual records. To improve latencyinsensitive write throughput to PNUTS, we introduce a batch write path. The systems provide conflicting consistency models, and we discuss techniques to isolate them from one another.

