Results 1 - 10
of
61
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Random Sampling for Histogram Construction: How much is enough?
, 1998
"... Random sampling is a standard technique for constructing (approximate) histograms for query optimization. However, any real implementation in commercial products requires solving the hard problem of determining "How much sampling is enough?" We address this critical question in the context of equi-h ..."
Abstract
-
Cited by 91 (11 self)
- Add to MetaCart
Random sampling is a standard technique for constructing (approximate) histograms for query optimization. However, any real implementation in commercial products requires solving the hard problem of determining "How much sampling is enough?" We address this critical question in the context of equi-height histograms used in many commercial products, including Microsoft SQL Server. We introduce a conservative error metric capturing the intuition that for an approximate histogram to have low error, the error must be small in all regions of the histogram. We then present a result establishing an optimal bound on the amount of sampling required for pre-specified error bounds. We also describe an adaptive page sampling algorithm which achieves greater efficiency by using all values in a sampled page but adjusts the amount of sampling depending on clustering of values in pages. Next, we establish that the problem of estimating the number of distinct values is provably difficult, but propose ...
AutoAdmin ’what-if’ index analysis utility
, 1998
"... surajitca microsoftcorn viveknar @ microsoft.com As databases get widely deployed, it becomes increasingly important to reduce the overhead of database administration. An important aspect of data administration that critically influences performance is the ability to select indexes for a database. I ..."
Abstract
-
Cited by 75 (9 self)
- Add to MetaCart
surajitca microsoftcorn viveknar @ microsoft.com As databases get widely deployed, it becomes increasingly important to reduce the overhead of database administration. An important aspect of data administration that critically influences performance is the ability to select indexes for a database. In order to decide the right indexes for a database, it is crucial for the database administrator (DBA) to be able to perform a quantitative analysis of the existing indexes. Furthermore, the DBA should have the ability to propose hypothetical (“what-if’) indexes and quantitatively analyze their impact on performance of the system. Such impact analysis may consist of analyzing workloads over the database, estimating changes in the cost of a workload, and studying index usage while taking into account projected changes in the sizes of the database tables. In this paper we describe a novel index analysis utility that we have prototyped for Microsoft SQL Server 7.0. We describe the interfaces exposed by this utility that can be leveraged by a variety of front-end tools and sketch important aspects of the user interfaces enabled by the utility. We also discuss the implementation techniques for efficiently supporting “what-if ’ indexes. Our framework can be extended to incorporate analysis of other aspects of physical database design. 1.
DB2 Advisor: An optimizer smart enough to recommend its own indexes
- In ICDE
, 2000
"... This paper introduces the concept of letting an RDBMS Optimizer optimize its own environment. In our project, we have used the DB2 Optimizer to tackle the index selection problem, a variation of the knapack problem. This paper will discuss our implementation of index recommendation, the user interfa ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
This paper introduces the concept of letting an RDBMS Optimizer optimize its own environment. In our project, we have used the DB2 Optimizer to tackle the index selection problem, a variation of the knapack problem. This paper will discuss our implementation of index recommendation, the user interface, and provide measurements on the quality of the recommended indexes. 1.
Physical Database Design for Data Warehouses
- in Proceedings of the International Conference on Database Engineering
, 1997
"... Data warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator will often materialize views defined on ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
Data warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator will often materialize views defined on the local relations to support common or complicated queries. Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting vie...
DB2 Design Advisor: Integrated Automatic Physical Database
- In VLDB
, 2004
"... The DB2 Design Advisor in IBM DB2 Universal Database^TM (DB2 UDB) Version 8.2 for Linux, UNIX and Windows is a tool that, for a given workload, automatically recommends physical design features that are any subset of indexes, materialized query tables (also called materialized views), shared-n ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
The DB2 Design Advisor in IBM DB2 Universal Database^TM (DB2 UDB) Version 8.2 for Linux, UNIX and Windows is a tool that, for a given workload, automatically recommends physical design features that are any subset of indexes, materialized query tables (also called materialized views), shared-nothing database partitionings, and multidimensional clustering of tables. Our work is the very first industrial-strength tool that covers the design of as many as four different features, a significant advance to existing tools, which support no more than just indexes and materialized views.
Generalized Partial Indexes
- In ICDE
, 1995
"... This paper demonstrates the use of generalized partial indexes for efficient query processing. We propose that partial indexes be built on those portions of the database that are statistically likely to be the most useful for query processing. We identify three classes of statistical information, an ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper demonstrates the use of generalized partial indexes for efficient query processing. We propose that partial indexes be built on those portions of the database that are statistically likely to be the most useful for query processing. We identify three classes of statistical information, and two levels at which it may be available. We describe indexing strategies that use this information to significantly improve average query performance. Results from simulation experiments demonstrate that the proposed generalized partial indexing strategies perform well compared to the traditional approach to indexing. 1 Introduction In traditional database systems, indexes are used primarily for the efficient associative retrieval of data. The database administrator (DBA) chooses to build indexes on selected columns of tuples in an RDBMS (or members of objects in an OODBMS) to speed up queries that apply predicates to those columns 1 . The choice of indexes is influenced by the number ...
Autopart: Automating schema design for large scientific databases using data partitioning
- In Proceedings of the 16th International Conference on Scientific and Statistical Database Management
, 2004
"... Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of the scientific workloads. Current automated physical design tools focus on the selection of indexes and materialized views. In large-scale scientific databases, however, the data volume and the continuous insertion of new data allows for only limited indexes and materialized views. By contrast, data partitioning does not replicate data, thereby reducing space requirements and minimizing update overhead. In this paper we propose AutoPart, an algorithm that automatically partitions database tables to optimize sequential access assuming prior knowledge of a representative workload. The resulting schema is indexed using a fraction of the space required for indexing the original schema. To evaluate AutoPart, we build an automated schema design tool that interfaces to commercial database systems. We experiment with AutoPart in the context of the Sloan Digital Sky Survey database, a real-world astronomical database, running on SQL Server 2000. Our experiments corroborate the benefits of partitioning for large-scale systems: Partitioning alone improves query execution performance by a factor of two on average. Combined with indexes, the new schema also outperforms the indexed original schema by 20 % (for queries) and a factor of five (for updates), while using only half the original index space.
Self-tuning database systems: A decade of progress
- in VLDB, 2007
"... In this paper we discuss advances in self-tuning database systems over the past decade, based on our experience in the AutoAdmin project at Microsoft Research. This paper primarily focuses on the problem of automated physical database design. We also highlight other areas where research on self-tuni ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
In this paper we discuss advances in self-tuning database systems over the past decade, based on our experience in the AutoAdmin project at Microsoft Research. This paper primarily focuses on the problem of automated physical database design. We also highlight other areas where research on self-tuning database technology has made significant progress. We conclude with our thoughts on opportunities and open issues. 1. HISTORY OF AUTOADMIN PROJECT Our VLDB 1997 paper [26] reported our first technical results from the AutoAdmin project that was started in Microsoft Research in the summer of 1996. The SQL Server product group at that time had taken on the ambitious task of redesigning the SQL Server code for their next release (SQL Server 7.0). Ease of use and elimination of knobs was a driving force for their design
Building Knowledge Base Management Systems
, 1995
"... . Advanced applications in fields such as CAD, software engineering, real-time process control, corporate repositories and digital libraries require the construction, efficient access and management of large, shared knowledge bases. Such knowledge bases cannot be built using existing too ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
.<F3.733e+05> Advanced applications in fields such as CAD, software engineering, real-time process control, corporate repositories and digital libraries require the construction, efficient access and management of large, shared knowledge bases. Such knowledge bases cannot be built using existing tools such as expert system shells, because these do not scale up, nor can they be built in terms of existing database technology, because such technology does not support the rich representational structure and inference mechanisms required for knowledge-based systems. This paper proposes a generic architecture for a knowledge base management system intended for such applications. The architecture assumes an object-oriented knowledge representation language with an assertional sublanguage used to express constraints and rules. It also provides for general-purpose deductive inference and special-purpose temporal reasoning. Results reported in the paper address several knowledge base management ...

