Results 1  10
of
12
Query optimization in database systems
 ACM Computing Surveys
, 1984
"... Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logicbased and semantic transformations, fast imple ..."
Abstract

Cited by 214 (0 self)
 Add to MetaCart
Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logicbased and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
A lineartime probabilistic counting algorithm for database applications
 ACM Transactions on Database Systems
, 1990
"... We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a smal ..."
Abstract

Cited by 95 (5 self)
 Add to MetaCart
(Show Context)
We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a small amount of space. Traditionally, accurate counts of unique values were obtained by sorting, which has O(q log q) time complexity. Our technique, called linear counting, is based on hashing. We present a comprehensive theoretical and experimental analysis of linear counting. The analysis reveals an interesting result: A load factor (number of unique values/hash table size) much larger than 1.0 (e.g., 12) can be used for accurate estimation (e.g., 1 % of error). We present this technique with two important applications to database problems: namely, (1) obtaining the column cardinality (the number of unique values in a column of a relation) and (2) obtaining the join selectivity (the number of unique values in the join column resulting from an unconditional join divided by the number of unique join column values in the relation to he joined). These two parameters are important statistics that are used in relational query optimization and physical database design.
Buffer management in relational database systems
 ACM Transactions on Database Systems
, 1986
"... The hotset model, characterizing the buffer requirements of relational queries, is presented. This model allows the system to determine the optimal buffer space to be allocated to a query; it can also be used by the query optimizer to derive efficient execution plans accounting for the available bu ..."
Abstract

Cited by 52 (1 self)
 Add to MetaCart
The hotset model, characterizing the buffer requirements of relational queries, is presented. This model allows the system to determine the optimal buffer space to be allocated to a query; it can also be used by the query optimizer to derive efficient execution plans accounting for the available buffer space, and by a query scheduler to prevent thrashing. The hotset model is compared with the workingset model. A simulation study is presented. Categories and Subject Descriptors: H.2.4 [Database Management]: Systemsquery processing
Reading a Set of Disk Pages
, 1993
"... The problem studied in this paper is as follows. Consider a file stored in contiguous space on disk. Given a list of pages to be retrieved from the file, what is the fastest way of retrieving them? It is assumed that adjacent pages on disk can be read with a single read request. The straightforward ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
The problem studied in this paper is as follows. Consider a file stored in contiguous space on disk. Given a list of pages to be retrieved from the file, what is the fastest way of retrieving them? It is assumed that adjacent pages on disk can be read with a single read request. The straightforward solution is to read the desired pages one by one. However, if two or more pages are located close to each other it may be faster to read them with a single read request, possibly even reading some intervening "empty" pages. It is shown that finding an optimal read schedule is equivalent to finding the shortest path in a certain graph. A very simple approximate algorithm is then introduced and (experimentally) shown to produce schedules that are close to optimal. The expected cost of schedules produced by this algorithm is derived. It is found that significant speedup can be achieved by the simple mechanism of using additional buffer space and issuing "large reads" whenever it is advantageous to do so.
On the Selection of Secondary Indices in Relational Databases
, 1993
"... An important problem in the physical design of databases is the selection of secondary indices. In general, this problem can not be solved in an optimal way due to the complexity of the selection process. Often use is made of heuristics such as the wellknown ADD and DROP algorithms. In this paper i ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
An important problem in the physical design of databases is the selection of secondary indices. In general, this problem can not be solved in an optimal way due to the complexity of the selection process. Often use is made of heuristics such as the wellknown ADD and DROP algorithms. In this paper it will be shown that frequently used cost functions can be classified as super or submodular functions. For these functions several mathematical properties have been derived which reduce the complexity of the index selection problem. These properties will be used to develop a tool for physical database design and also give a mathematical foundation for the success of the beforementioned ADD and DROP algorithms. Keywords: Physical database design, Secondary index selection, ADD and DROP algorithms, Supermodular functions, Submodular functions. 1 Introduction Physical database design is an important step in designing databases and aims to generate efficient storage structures for the data....
An analytical study of object identifier indexing
 In Proceedings of the 9th International Conference on Database and Expert Systems Applications, DEXA’98
, 1998
"... The object identifier index of an objectoriented database system is typically 20 % of the size of the database itself, and for large databases, only a small part of the index fits in main memory. To avoid index retrievals becoming a bottleneck, efficient buffering strategies are needed to minimize ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
The object identifier index of an objectoriented database system is typically 20 % of the size of the database itself, and for large databases, only a small part of the index fits in main memory. To avoid index retrievals becoming a bottleneck, efficient buffering strategies are needed to minimize the number of disk accesses. In this report, we develop analytical cost models which we use to find optimal sizes of index page buffer and index entry cache, for different memory sizes, index sizes, and access patterns. Because existing buffer hit estimation models are not applicable for index page buffering in the case of tree based indexes, we have also developed an analytical model for index page buffer performance. The cost gain from using the results in this report is typically in the order of 200300%. Thus, the results should be of valuable use in optimizers and tools for configuration and tuning of objectoriented database systems. 1
A new method for estimating the number of objects satisfying an objectoriented query involving partial participation of classes
 Information Systems
, 1996
"... Abstract  The intermediate result cardinality {thenumber of objects satisfying a condition given in a query { is an important factor for estimating the cost of the query in query optimization. In this paper we show that an objectoriented query often involves partial participation of classes in a r ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract  The intermediate result cardinality {thenumber of objects satisfying a condition given in a query { is an important factor for estimating the cost of the query in query optimization. In this paper we show that an objectoriented query often involves partial participation of classes in a relationship. We then present anewtechnique for estimating the intermediate result cardinality in such a query. Partial participation has not been considered seriously in existing techniques. Since the proposed technique uses detailed statistics to accommodate partial participation, it estimates the intermediate result cardinality more accurately than existing ones. We also show that these statistics are easily obtained by using inherent properties of objectoriented databases.
Towards Declarative Querying for Biological Sequences
"... The ongoing revolution in life sciences research is producing vast amounts of genetic and proteomic sequence data. Scientists want to pose increasingly complex queries on this data, but current methods for querying biological sequences are primitive and largely procedural. This limits the ease with ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The ongoing revolution in life sciences research is producing vast amounts of genetic and proteomic sequence data. Scientists want to pose increasingly complex queries on this data, but current methods for querying biological sequences are primitive and largely procedural. This limits the ease with which complex queries can be posed, and often results in very inefficient query plans. There is a growing and urgent need for declarative and efficient methods for querying biological sequence data. In this paper we introduce a system called Periscope/SQ which addresses this need. Queries in our system are based on a welldefined extension of relational algebra. We introduce new physical operators and support for novel indexes in the database. As part of the optimization framework, we describe a new technique for selectivity estimation of string pattern matching predicates that is more accurate than previous methods. We also describe a simple, yet highly effective algorithm to optimize sequence queries. Using a realworld application in eye genetics, we show how Periscope/SQ can be used to achieve a speedup of two orders of magnitude over existing procedural methods!
Buffering of Index Structures
 in: Proc. SPIE
, 2000
"... Buffering of index structures is an important problem, because disk I/O dominates the cost of queries. In this paper, we compare existing algorithms for uniform, nonuniform static and nonuniform dynamic access patterns. We experimentally show that the LRU2 method is better than the other methods. W ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Buffering of index structures is an important problem, because disk I/O dominates the cost of queries. In this paper, we compare existing algorithms for uniform, nonuniform static and nonuniform dynamic access patterns. We experimentally show that the LRU2 method is better than the other methods. We also propose an efficient implementation of the LRU2 algorithm. In the second part of the paper, we propose a new buffering algorithm for a distributed system where each machine has its own buffer. We show experimentally that this method performs better than other buffering techniques. Keywords: buffering, index structure 1.
Estimating Block Accesses
, 1997
"... For batched searching of a B + tree file, previous formulas which estimate the number of block accesses fail when the blocking factor at the leaf level is less than one. We overcome this weakness by developing both an exact formula and an approximate formula to estimate the number of block access ..."
Abstract
 Add to MetaCart
For batched searching of a B + tree file, previous formulas which estimate the number of block accesses fail when the blocking factor at the leaf level is less than one. We overcome this weakness by developing both an exact formula and an approximate formula to estimate the number of block accesses in a batched searchofaB + tree file where the blocking factor at the leaf level may be less than one.