Results 1 - 10
of
54
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
An overview of query optimization in relational systems
- In PODS
, 1998
"... There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased an ..."
Abstract
-
Cited by 99 (1 self)
- Add to MetaCart
There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased and incomplete view of this field. The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area. I would like to apologize to the many contributors in this area whose work I have failed to explicitly acknowledge due to oversight or lack of space. I take the liberty of trading technical precision for ease of presentation. 2.
RankSQL: Query algebra and optimization for relational top-k queries
- In SIGMOD
, 2005
"... This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (top-k) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, top-k query processing is studied in the ..."
Abstract
-
Cited by 71 (15 self)
- Add to MetaCart
This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (top-k) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, top-k query processing is studied in the middleware scenario or in RDBMS in a “piecemeal ” fashion, i.e., focusing on specific operator or sitting outside the core of query engines. In contrast, we aim to support ranking as a first-class database construct. As a key insight, the new ranking relationship can be viewed as another logical property of data, parallel to the “membership ” property of relational data model. While membership is essentially supported in RDBMS, the same support for ranking is clearly lacking. We address the fundamental integration of ranking in RDBMS in a way similar to how membership, i.e., Boolean filtering, is supported. We extend relational algebra by proposing a rank-relational model to capture the ranking property, and introducing new and extended operators to support ranking as a first-class construct. Enabled by the extended algebra, we present a pipelined and incremental execution model of ranking query plans (that cannot be expressed traditionally) based on a fundamental ranking principle. To optimize top-k queries, we propose a dimensional enumeration algorithm to explore the extended plan space by enumerating plans along two dual dimensions: ranking and membership. We also propose a sampling-based method to estimate the cardinality of rank-aware operators, for costing plans. Our experiments show the validity of our framework and the accuracy of the proposed estimation model. 1.
Optimizing queries using materialized views: A practical, scalable solution
, 2001
"... Materialized views can provide massive improvements in query processing time, especially for aggregation queries over large tables. To realize this potential, the query optimizer must know how and when to exploit materialized views. This paper presents a fast and scalable algorithm for determining w ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
Materialized views can provide massive improvements in query processing time, especially for aggregation queries over large tables. To realize this potential, the query optimizer must know how and when to exploit materialized views. This paper presents a fast and scalable algorithm for determining whether part or all of a query can be computed from materialized views and describes how it can be incorporated in transformation-based optimizers. The current version handles views composed of selections, joins and a final group-by. Optimization remains fully cost based, that is, a single “best ” rewrite is not selected by heuristic rules but multiple rewrites are generated and the optimizer chooses the best alternative in the normal way. Experimental results based on an implementation in Microsoft SQL Server show outstanding performance and scalability. Optimization time increases slowly with the number of views but remains low even up to a thousand.
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
"... Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, processing is typically done on large clusters of shared-nothing commodity machines. It is imperative to develop a programm ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, processing is typically done on large clusters of shared-nothing commodity machines. It is imperative to develop a programming model that hides the complexity of the underlying system but provides flexibility by allowing users to extend functionality to meet a variety of requirements. In this paper, we present a new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis. The language is designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters. SCOPE borrows several features from SQL. Data is modeled as sets of rows composed of typed columns. The select statement is retained with inner joins, outer joins, and aggregation allowed. Users can easily define their own functions and implement their own versions of operators: extractors (parsing and constructing rows from a file), processors (row-wise processing), reducers (group-wise processing), and combiners (combining rows from two inputs). SCOPE supports nesting of expressions but also allows a computation to be specified as a series of steps, in a manner often preferred by programmers. We also describe how scripts are compiled into efficient, parallel execution plans and executed on large clusters. 1.
Robust cardinality and cost estimation for skyline operator
- In ICDE
, 2006
"... Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show that our technique is substantially more effective than traditional approaches. Finally, we show through an implementation in Microsoft SQL Server that skyline queries can substantially benefit from our techniques. 1
OPT++: An Object-Oriented Implementation for Extensible Database Query Optimization
- VLDB Journal
, 1999
"... Paper Number 323 In this paper we describe the design and implementation of OPT++, a tool for Extensible Database Query Optimization that uses an object-oriented design to simplify the task of implementing, extending, and modifying an optimizer. Building an optimizer using OPT++ makes it easy to ext ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Paper Number 323 In this paper we describe the design and implementation of OPT++, a tool for Extensible Database Query Optimization that uses an object-oriented design to simplify the task of implementing, extending, and modifying an optimizer. Building an optimizer using OPT++ makes it easy to extend the query algebra (to add new query algebra operators and physical implementation algorithms to the system), easy to change the search space explored, and also easy to change the search strategy used. Furthermore, OPT++ comes equipped with a number of search strategies that are available for use by an Optimizer{ Implementor. OPT++ considerably simpli es both, the task of implementing an optimizer for a new database system, and the task of experimenting with various optimization techniques and strategies to decide what techniques are best suited for that database system. We present the results of performance studies which validate our design and show that, in spite of its exibility, OPT++ can be used to build e cient optimizers. 1
Flexible and Scalable Cost-Based Query Planning in Mediators: A Transformational Approach
- Artificial Intelligence Journal
, 2000
"... The Internet provides access to a wealth of information. For any given topic or application domain there are a variety of available information sources. However, current systems, such as search engines or topic directories in the World Wide Web, offer only very limited capabilities for locating, com ..."
Abstract
-
Cited by 22 (11 self)
- Add to MetaCart
The Internet provides access to a wealth of information. For any given topic or application domain there are a variety of available information sources. However, current systems, such as search engines or topic directories in the World Wide Web, offer only very limited capabilities for locating, combining, and organizing information. Mediators, systems that provide integrated access and database-like query capabilities to information distributed over heterogeneous sources, are critical to realize the full potential of meaningful access to networked information. Query planning, the task of generating a cost-efficient plan that computes a user query from the relevant information sources, is central to mediator systems. However, query planning is a computationally hard problem due to the large number of possible sources and possible orderings on the operations to process the data. Moreover, the choice of sources, data processing operations, and their ordering, strongly affects the plan c...
Changing the Rules: Transformations for Rule-Based Optimizers
- In Proceedings of the ACM SIGMOD International Conference on Management of Data
, 1998
"... Rule-based optimizers are extensible because they consist of modifiable sets of rules. For modification to be straightforward, rules must be easily reasonedabout (i.e., understood and verified). At the same time, rules must be expressive and efficient (to fire) for rulebased optimizers to be practic ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Rule-based optimizers are extensible because they consist of modifiable sets of rules. For modification to be straightforward, rules must be easily reasonedabout (i.e., understood and verified). At the same time, rules must be expressive and efficient (to fire) for rulebased optimizers to be practical. Production-style rules (as in [15]) are expressed with code and are hard to reason about. Pure rewrite rules (as in [1]) lack code, but cannot atomically express complex transformations (e.g., normalizations). Some systems allow rules to be grouped, but sacrifice efficiency by providing limited control over their firing. Therefore, none of these approaches succeeds in making rules expressive, efficient and understandable. We propose a language (COKO) for expressing an alternative form of input to a rule-based optimizer. A COKO transformation consists of a set of declarative (KOLA) rewrite rules and a (firing) algorithm that specifies their firing. It is straightforward to reason about C...
Self-tuning database systems: A decade of progress
- in VLDB, 2007
"... In this paper we discuss advances in self-tuning database systems over the past decade, based on our experience in the AutoAdmin project at Microsoft Research. This paper primarily focuses on the problem of automated physical database design. We also highlight other areas where research on self-tuni ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
In this paper we discuss advances in self-tuning database systems over the past decade, based on our experience in the AutoAdmin project at Microsoft Research. This paper primarily focuses on the problem of automated physical database design. We also highlight other areas where research on self-tuning database technology has made significant progress. We conclude with our thoughts on opportunities and open issues. 1. HISTORY OF AUTOADMIN PROJECT Our VLDB 1997 paper [26] reported our first technical results from the AutoAdmin project that was started in Microsoft Research in the summer of 1996. The SQL Server product group at that time had taken on the ambitious task of redesigning the SQL Server code for their next release (SQL Server 7.0). Ease of use and elimination of knobs was a driving force for their design

