Results 1 - 10
of
15
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Weaving Relations for Cache Performance
, 2001
"... Relational database systems have traditionally optimzed for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on m ..."
Abstract
-
Cited by 83 (14 self)
- Add to MetaCart
Relational database systems have traditionally optimzed for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on modern platforms. In this paper, we first demonstrate that in-page data placement is the key to high cache performance and that NSM exhibits low cache utilization on modern platforms. Next, we propose a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page. Because PAX only affects layout inside the pages, it incurs no storage penalty and does not affect I/O behavior. According to our experimental results, when compared to NSM (a) PAX exhibits superior cache and memory bandwidth utilization, saving at least 75% of NSM's stall time due to data cache accesses, (b) range selection queries and updates on memoryresident relations execute 17-25% faster, and (c) TPC-H queries involving I/O execute 11-48% faster.
Integrating vertical and horizontal partitioning into automated physical database design
- In Proceedings of ACM SIGMOD
, 2004
"... In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and t ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and their underlying tables partitioned identically so as to make common operations such as backup/restore easier. While partitioning is important, incorporating partitioning makes the problem of automating physical design much harder since: (a) The choices of partitioning can strongly interact with choices of indexes and materialized views. (b) A large new space of physical design alternatives must be considered. (c) Manageability requirements impose a new constraint on the problem. In this paper, we present novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account. We have implemented our techniques and evaluated it on Microsoft SQL Server. Our experiments highlight: (a) the importance of taking an integrated approach to automated physical design and (b) the scalability of our techniques. 1.
Performance tradeoffs in read-optimized databases
- In VLDB 2006: Proceedings of the 32nd international conference on Very large data bases
, 2006
"... Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certai ..."
Abstract
-
Cited by 31 (11 self)
- Add to MetaCart
Database systems have traditionally optimized performance for write-intensive workloads. Recently, there has been renewed interest in architectures that optimize read performance by using column-oriented data representation and light-weight compression. This previous work has shown that under certain broad classes of workloads, column-based systems can outperform rowbased systems. Previous work, however, has not characterized the precise conditions under which a particular query workload can be expected to perform better on a column-oriented database. In this paper we first identify the distinctive components of a read-optimized DBMS and describe our implementation of a high-performance query engine that can operate on both row and column-oriented data. We then use our prototype to perform an in-depth analysis of the tradeoffs between column and row-oriented architectures. We explore these tradeoffs in terms of disk bandwidth, CPU cache latency, and CPU cycles. We show that for most database workloads, a carefully designed column system can outperform a carefully designed row system, sometimes by an order of magnitude. We also present an analytical model to predict whether a given workload on a particular hardware configuration is likely to perform better on a row or column-based system. 1.
Data Page Layouts for Relational Databases on Deep Memory Hierarchies
, 2002
"... Relational database systems have traditionally optimized for I/0 performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
Relational database systems have traditionally optimized for I/0 performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages).
A Multi-resolution Block Storage Model for Database Design
- Proceedings of the 2003 IDEAS Conference
, 2003
"... We propose a new storage model called MBSM (Multiresolution Block Storage Model) for laying out tables on disks. MBSM is intended to speed up operations such as scans that are typical of data warehouse workloads. Disk blocks are grouped into “super-blocks, ” with a single record stored in a partitio ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We propose a new storage model called MBSM (Multiresolution Block Storage Model) for laying out tables on disks. MBSM is intended to speed up operations such as scans that are typical of data warehouse workloads. Disk blocks are grouped into “super-blocks, ” with a single record stored in a partitioned fashion among the blocks in a superblock. The intention is that a scan operation that needs to consult only a small number of attributes can access just those blocks of each super-block that contain the desired attributes. To achieve good performance given the physical characteristics of modern disks, we organize super-blocks on the disk into fixed-size “mega-blocks. ” Within a megablock, blocks of the same type (from various super-blocks) are stored contiguously. We describe the changes needed in a conventional database system to manage tables using such a disk organization. We demonstrate experimentally that MBSM outperforms competing approaches such as NSM (N-ary Storage Model), DSM (Decomposition Storage Model) and PAX (Partition Attributes Across), for I/O bound decision-support workloads consisting of scans in which not all attributes are required. This improved performance comes at the expense of single-record insert and delete performance; we quantify the trade-offs involved. Unlike DSM, the cost of reconstructing a record from its partitions is small. MBSM stores attributes in a vertically partitioned manner similar to PAX, and thus shares PAX’s good CPU cache behavior. We describe methods for mapping attributes to blocks within super-blocks in order to optimize overall performance, and show how to tune the super-block and mega-block sizes. 1
Here are my data files. here are my queries. where are my results
- In CIDR
, 2011
"... Database management systems (DBMS) provide incredible flexibility and performance when it comes to query processing, scalability and accuracy. To fully exploit DBMS features, however, the user must define a schema, load the data, tune the system for the expected workload, and answer several question ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Database management systems (DBMS) provide incredible flexibility and performance when it comes to query processing, scalability and accuracy. To fully exploit DBMS features, however, the user must define a schema, load the data, tune the system for the expected workload, and answer several questions. Should the database use a column-store, a row-store or some hybrid format? What indices should be created? All these questions make for a formidable and time-consuming hurdle, often deterring new applications or imposing high cost to existing ones. A characteristic example is that of scientific databases with huge data sets. The prohibitive initialization cost and complexity still forces scientists to rely on “ancient ” tools for their data management tasks, delaying scientific understanding and progress. Users and applications collect their data in flat files, which have traditionally been considered to be “outside ” a DBMS. A DBMS wants control: always bring all data “inside”, replicate it and format it in its own “secret ” way. The problem has been recognized and current efforts extend existing systems with abilities such as reading information from flat files and gracefully incorporating it into the processing engine. This paper proposes a new generation of systems where the only requirement from the user is a link to the raw data files. Queries can then immediately be fired without preparation steps in between. Internally and in an abstract way, the system takes care of selectively, adaptively and incrementally providing the proper environment given the queries at hand. Only part of the data is loaded at any given time and it is being stored and accessed in the format suitable for the current workload. 1.
A genetic Algorithm for Vertical Fragmentation and Access Path Selection
- The Computer Journal
"... ..."
A Framework for Automating Physical Database Design
"... We propose a two-pha.se algorithm for physical database design. In phase one the algorithm, for each logical query, loses rules to determine characteristics of a physical design (such as indexes) that would be beneficial to the query, and selects a physical design that yields a low cost estimate for ..."
Abstract
- Add to MetaCart
We propose a two-pha.se algorithm for physical database design. In phase one the algorithm, for each logical query, loses rules to determine characteristics of a physical design (such as indexes) that would be beneficial to the query, and selects a physical design that yields a low cost estimate for that query. In phase two we use a notion of compromase between physical database designs. St,arting from the physical designs selected in phase one, the algorit,hm looks for a compromise physical design that. minimizes the queries ’ aggregate frequencyweighted cost. This method is envisioned as a cornerstone in the efficient implementation of a Turing-complete, very-high-level progra.mming language for dat,abase applica.tions, but it is also suitable for more conventional relational and-1NF database management. systems. 1

