Results 1 - 10
of
13
Cost-based Selection of Path Expression Processing Algorithms in Object-Oriented Databases
- IN PROCEEDINGS OF THE TWENTY-SECOND INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES
, 1996
"... An object query can include a path expression to traverse a number of related collections. The order of collection traversals given by the path expression may not be the most efficient to process the query. This generates a critical problem for an object query optimizer to select the best execution ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
An object query can include a path expression to traverse a number of related collections. The order of collection traversals given by the path expression may not be the most efficient to process the query. This generates a critical problem for an object query optimizer to select the best execution plan. This paper studies the different algorithms to process path expressions with predicates, including depth first navigation, forward and reverse joins. Using a cost model, it then compares their performances in different cases, according to memory size, selectivity of predicates, fan out between collections, etc.. It also presents a heuristic-based algorithm to find profitable n-ary operators for traversing collections, thus reducing the search space of query plans to process a query with a qualified path expression. An implementation based on the O2 system demonstrates the validity of the results.
Cost-Based Optimization for Magic: Algebra and Implementation
- In Proc. of ACM SIGMOD
, 1996
"... Magic sets rewriting is a well-known optimization heuristic for complex decision-support queries. There can be many variants of this rewriting even for a single query, which differ greatly in execution performance. We propose cost-based techniques for selecting an efficient variant from the many cho ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Magic sets rewriting is a well-known optimization heuristic for complex decision-support queries. There can be many variants of this rewriting even for a single query, which differ greatly in execution performance. We propose cost-based techniques for selecting an efficient variant from the many choices. Our first contribution is a practical scheme that modelsmagic sets rewriting as a special join method that can be added to any cost-based query optimizer. We derive cost formulas that allow an optimizer to choose the best variant of the rewriting and to decide whether it is beneficial. The order of complexity of the optimization process is preserved by limiting the search space in a reasonable manner. We have implemented this technique in IBM's DB2 C/S V2 database system. Our performance measurements demonstrate that the costbasedmagic optimization technique performs well, and that without it, several poor decisions could be made. Our second contribution is a formal algebraic model of ...
Random Sampling from Databases - A Survey
- Statistics and Computing
, 1994
"... This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in constructing DBMS sampling algorithms, e.g. ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in constructing DBMS sampling algorithms, e.g., acceptance/rejection and reservoir sampling. A discussion of sampling from various data structures follows: B + trees, hash files, spatial data structures (including R-trees and quadtrees)). Algorithms for sampling from simple relational queries, e.g., single relational operators such as selection, intersection, union, set difference, projection, and join are then described. We then describe sampling for estimation of aggregates (e.g., the size of query results). Here we discuss both clustered sampling, and sequential sampling approaches. Decision theoretic approaches to sampling for query optimization are reviewed. DRAFT of March 22, 1994. 1 Introduction In this paper we sur...
Query-Driven Horizontal Class Partitioning for Object-Oriented Databases
- in Lecture Notes in Computer Science
, 1998
"... . Horizontal partitioning (HP) technique is a logical database design technique first developed for distributed relational (record structured) databases. Though the main objective of HP is to reduce irrelevant data access, there has been very little work done in quantifying this by means of a cost m ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
. Horizontal partitioning (HP) technique is a logical database design technique first developed for distributed relational (record structured) databases. Though the main objective of HP is to reduce irrelevant data access, there has been very little work done in quantifying this by means of a cost model, and developing algorithms which derive the horizontal partitioning schemes. In this paper, we develop a cost model for query processing in object oriented databases (OODBs) for both unpartitioned and horizontally partitioned object oriented databases. The problem of coming up optimal HP scheme is NP-complete. Therefore, we present and evaluate two algorithms for coming up with optimal (based on exhaustive search) and/or near optimal (based on heuristics) HP schemes. 1 Introduction OODB technology has matured enough to support modern applications such as CAM/CAD, software engineering and multimedia databases. HP is often used as a means to achieve better performance out of database sys...
Object/Relational Query Optimization with Chase and Backchase
, 2000
"... Traditionally, query optimizers assume a direct mapping from the logical entities modeling the data (e.g. relations) and the physical entities storing the data (e.g. indexes), each physical entity corresponding precisely to one logical entity. This assumption is no longer true in non-traditional app ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Traditionally, query optimizers assume a direct mapping from the logical entities modeling the data (e.g. relations) and the physical entities storing the data (e.g. indexes), each physical entity corresponding precisely to one logical entity. This assumption is no longer true in non-traditional applications (object-oriented and semi-structured databases, data integration), which often exhibit a mismatch between the logical view and the actual storage of data. In addition, there is an increased amount of redundancy, even at the logical level, that can greatly enhance optimization opportunities, if exploited. To deal with all this, we propose a novel architecture for query optimization, in which physical optimization is leveraged at the level of query rewriting. As a consequence, the other important aspect of query optimization, semantic optimization (that takes advantage of the redundancy at the logical level), can be naturally incorporated. The optimizer can then make global decisions based on both semantic and physical knowledge, leading to plans of higher quality than those obtainable by a traditional two-level approach. The main idea
An analytical study of object identifier indexing
- In Proceedings of the 9th International Conference on Database and Expert Systems Applications, DEXA’98
, 1998
"... The object identifier index of an object-oriented database system is typically 20 % of the size of the database itself, and for large databases, only a small part of the index fits in main memory. To avoid index retrievals becoming a bottleneck, efficient buffering strategies are needed to minimize ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The object identifier index of an object-oriented database system is typically 20 % of the size of the database itself, and for large databases, only a small part of the index fits in main memory. To avoid index retrievals becoming a bottleneck, efficient buffering strategies are needed to minimize the number of disk accesses. In this report, we develop analytical cost models which we use to find optimal sizes of index page buffer and index entry cache, for different memory sizes, index sizes, and access patterns. Because existing buffer hit estimation models are not applicable for index page buffering in the case of tree based indexes, we have also developed an analytical model for index page buffer performance. The cost gain from using the results in this report is typically in the order of 200-300%. Thus, the results should be of valuable use in optimizers and tools for configuration and tuning of object-oriented database systems. 1
Analysis and Validation of Information Access through Mono, Multidimensional and Dynamic Taxonomies
- Multidimensional and Dynamic Taxonomies, FQAS 2006, 7th International Conference on Flexible Query Answering Systems
, 2002
"... Access to complex information bases through multidimensional, dynamic taxonomies (also improperly known as faceted classifications) is becoming a hot concept both in research and in industry. In this paper, the major shortcomings of conventional, monodimensional taxonomic approaches, such as the ind ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Access to complex information bases through multidimensional, dynamic taxonomies (also improperly known as faceted classifications) is becoming a hot concept both in research and in industry. In this paper, the major shortcomings of conventional, monodimensional taxonomic approaches, such as the independence of different branches of the taxonomy and insufficient scalability, are discussed. The dynamic taxonomy approach, the first and most complete model for multidimensional taxonomic access to date, is reviewed and compared to conventional taxonomies. We analyze the reducing power of dynamic taxonomies and conventional taxonomies and report experimental results on real data, which confirm that monodimensional taxonomies are not useful for browsing/retrieval on large databases, whereas dynamic taxonomies can effectively manage very large databases and exhibit a very fast convergence. ACM Classification Keywords
An Evaluation of Vertical Class Partitioning for Query Processing in Object-Oriented Databases
- IEEE
, 2007
"... Vertical partitioning is a design technique for reducing the number of disk accesses to execute a given set of queries by minimizing the number of irrelevant instance variables accessed. This is accomplished by grouping the frequently accessed instance variables as vertical class fragments. The co ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Vertical partitioning is a design technique for reducing the number of disk accesses to execute a given set of queries by minimizing the number of irrelevant instance variables accessed. This is accomplished by grouping the frequently accessed instance variables as vertical class fragments. The complexity of object-oriented database models due to subclass hierarchy and class composition hierarchy complicates the definition and representation of vertical partitioning of the classes, which makes the problem of vertical partitioning in OODBs very challenging. In this paper, we develop a comprehensive analytical cost model for processing of queries on vertically partitioned OODB classes. A set of analytical evaluation results is presented to show the effect of vertical partitioning, and to study the trade-off between projection ratio vs. selectivity factor vis-a-vis sequential vs. index access. Furthermore, an empirical experimental prototype supporting vertical class partitioning has been implemented on a commercial OODB tool kit to validate our analytical cost model. Index Terms--Class partitioning, vertical partitioning, query processing, object-oriented databases, analytical cost model. 1 (1) Chi-wai Fung is with Department of Computing, Hong Kong Institute of Vocational Education (Tuen Mun), Vocational Training Council, Tsing Wun Road, Tuen Mun, Hong Kong, China. E-mail: cwfung@vtc.edu.hk. (2) Kamalakar Karlapalem is with Indian Institute of Information Technology, Gachhibowli, Hyderabad 500019, INDIA. E-mail: kamal@iiit.net. (3) Qing Li is with Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, CHINA. E-mail: csqli@cityu.edu.hk. 1
Cost-Controlled OFL Rewriting Rules for Multiple Collection Traversals
, 1995
"... The integration of path expression evaluations in an object-oriented cost-driven query optimizer is the subject of this paper. We present rewriting rules that transform path expressions into different object target programs (i.e., query execution plans). An analytical cost model is presented and use ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The integration of path expression evaluations in an object-oriented cost-driven query optimizer is the subject of this paper. We present rewriting rules that transform path expressions into different object target programs (i.e., query execution plans). An analytical cost model is presented and used to compare the quality of various programs. We have used the OFL (Object Functional Language) traversal formalism to describe the rules. OFL is a target object language for OQL-like query compilers. By using the OFL formalism, the most efficient program corresponds to the best collection traversal. Experimental results when traversing multiple collections highlight the outperformance in some cases of ternary hash joins compared to classical binary hash join traversals. Binary and ternary joins are more responsive to the join selectivity than to the memory size. The results show that traversing path indexes are usually the most efficient traversals. Pipelined (i.e., one object at a time or ...

