Results 1 - 10
of
44
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 181 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Optimizing Large Join Queries in Mediation Systems
- International Conference on Database Theory (ICDT
, 1999
"... . In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the probl ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
. In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the problem of finding feasible and efficient query plans for mediator systems. We consider conjunctive queries on mediators and model the source capabilities through attribute-binding adornments. We use a simple cost model that focuses on the major costs in mediation systems, those involved with sending queries to sources and getting answers back. Under this metric, we develop two algorithms for source query sequencing -- one based on a simple greedy strategy and another based on a partitioning scheme. The first algorithm produces optimal plans in some scenarios, and we show a linear bound on its worst case performance when it misses optimal plans. The second algorithm generates optimal plans in mor...
Physical Data Independence, Constraints, and Optimization with Universal Plans
, 1999
"... We present an optimization method and algorithm designed for three objectives: physical data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and "backchase" with constraints (dependencies). By using dictionaries (finite funct ..."
Abstract
-
Cited by 36 (10 self)
- Add to MetaCart
We present an optimization method and algorithm designed for three objectives: physical data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and "backchase" with constraints (dependencies). By using dictionaries (finite functions) in physical schemas we can capture with constraints useful access structures such as indexes, materialized views, source capabilities, access support relations, gmaps, etc. The search space for query plans is de ned and enumerated in a novel manner: the chase phase rewrites the original query into a "universal" plan that integrates all the access structures and alternative pathways that are allowed by applicable constraints. Then, the backchase phase produces optimal plans by eliminating various combinations of redundancies, again according to constraints. This method is applicable (sound) to a large class of queries, physical access structures, and semantic constraints. We prove that it is in fact complete for "path-conjunctive" queries and views with complex objects, classes and dictionaries, going beyond previous theoretical work on processing queries using materialized views.
Iterative Dynamic Programming: A New Class of Query Optimization Algorithms
- ACM Trans. on Database Systems
, 1998
"... The query optimizer is one of the most important components of a database system. Most commercial query optimizers today are based on a dynamic-programming algorithm, as proposed in [SAC+79]. While this algorithm produces good optimization results (i.e., good plans), its high complexity can be prohi ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
The query optimizer is one of the most important components of a database system. Most commercial query optimizers today are based on a dynamic-programming algorithm, as proposed in [SAC+79]. While this algorithm produces good optimization results (i.e., good plans), its high complexity can be prohibitive if complex queries need to be processed, new query execution techniques need to be integrated, or in certain programming environments (e.g., distributed database systems). In this paper, we present and thoroughly evaluate a new class of query optimization algorithms that are based on a principle that we call iterative dynamic programming, or IDP for short. IDP has several important advantages: First, IDP-algorithms produce the best plans of all known algorithms in situations in which dynamic programming is not viable because of its high complexity. Second, some IDP variants are adaptive and produce as good plans as dynamic programming if dynamic programming is viable an...
Generating Efficient Plans for Queries Using Views
- IN PROC. OF THE ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD ’01
, 2001
"... We study the problem of generating efficient, equivalent rewritings using views to compute the answer to a query. We take the closed-world assumption, in which views are materialized from base relations, rather than views describing sources in terms of abstract predicates, as is common when the open ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
We study the problem of generating efficient, equivalent rewritings using views to compute the answer to a query. We take the closed-world assumption, in which views are materialized from base relations, rather than views describing sources in terms of abstract predicates, as is common when the open-world assumption is used. In the closedworld model, there can be an infinite number of different rewritings that compute the same answer, yet have quite different performance. Query optimizers take a logical plan (a rewriting of the query) as an input, and generate efficient physical plans to compute the answer. Thus our goal is to generate a small subset of the possible logical plans without missing an optimal physical plan.
Index Structures and Algorithms for Querying Distributed RDF Repositories
- WWW2004, MAY 17–22; THE NETHERLANDS
, 2004
"... A technical infrastructure for storing, querying and managing RDF data is a key element in the current semantic web development. Systems like Jena, Sesame or the ICS-FORTH RDF Suite are widely used for building semantic web applications. Currently, none of these systems supports the integrated quer ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
A technical infrastructure for storing, querying and managing RDF data is a key element in the current semantic web development. Systems like Jena, Sesame or the ICS-FORTH RDF Suite are widely used for building semantic web applications. Currently, none of these systems supports the integrated querying of distributed RDF repositories. We consider this a major shortcoming since the semantic web is distributed by nature. In this paper we present an architecture for querying distributed RDF repositories by extending the existing Sesame system. We discuss the implications of our architecture and propose an index structure as well as algorithms for query processing and optimization in such a distributed context.
Distributed Queries and Query Optimization in Schema-Based P2P-Systems
- In Proceedings of the International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P
, 2003
"... Databases have employed a schema-based approach to store and retrieve structured data for decades. For peer-to-peer (P2P) networks, similar approaches are just beginning to emerge, also motivated by the fact, that sending (atomic) queries to the appropriate peers clearly fails for queries which n ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
Databases have employed a schema-based approach to store and retrieve structured data for decades. For peer-to-peer (P2P) networks, similar approaches are just beginning to emerge, also motivated by the fact, that sending (atomic) queries to the appropriate peers clearly fails for queries which need data from more than one peer to be executed. While quite a few database techniques can be re-used in this new context, a P2P data management infrastructure poses additional challenges which have to be solved before schema-based P2P networks become as common as schema-based databases. Because of the dynamic nature of P2P networks, we can neither assume global knowledge about data distribution, nor are static topologies and static query plans suitable for these networks. Unlike in traditional distributed database systems, we cannot assume a complete schema instance but rather work with a distributed schema which directs query processing tasks from one node to one or more neighboring nodes.
Optimized Seamless Integration of Biomolecular Data
- IEEE symposium on Bio-Informatics and Biomedical Engineering (BIBE’2001), Washington DC
, 2001
"... Today, scientific data is inevitably digitized, stored in a variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Today, scientific data is inevitably digitized, stored in a variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent metadata about source contents and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans.
A New Heuristic for Optimizing Large Queries
- In 9th International Conference, DEXA'98
, 1998
"... There is a number of OODB optimization techniques proposed recently, such as the translation of path expressions into joins and query unnesting, that may generate a large number of implicit joins even for simple queries. Unfortunately, most current commercial query optimizers are still based on the ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
There is a number of OODB optimization techniques proposed recently, such as the translation of path expressions into joins and query unnesting, that may generate a large number of implicit joins even for simple queries. Unfortunately, most current commercial query optimizers are still based on the dynamic programming approach of System R, and cannot handle queries of more than ten tables. There is a number of recent proposals that advocate the use of combinatorial optimization techniques, such as iterative improvement and simulated annealing, to deal with the complexity of this problem. These techniques, though, fail to take advantage of the rich semantic information inherent in the query specification, such as the information available in query graphs, which gives a good handle to choose which relations to join each time. This paper presents a polynomial-time algorithm that generates a good quality order of relational joins. It can also be used with minor modifications to sort OODB a...
Object/Relational Query Optimization with Chase and Backchase
, 2000
"... Traditionally, query optimizers assume a direct mapping from the logical entities modeling the data (e.g. relations) and the physical entities storing the data (e.g. indexes), each physical entity corresponding precisely to one logical entity. This assumption is no longer true in non-traditional app ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Traditionally, query optimizers assume a direct mapping from the logical entities modeling the data (e.g. relations) and the physical entities storing the data (e.g. indexes), each physical entity corresponding precisely to one logical entity. This assumption is no longer true in non-traditional applications (object-oriented and semi-structured databases, data integration), which often exhibit a mismatch between the logical view and the actual storage of data. In addition, there is an increased amount of redundancy, even at the logical level, that can greatly enhance optimization opportunities, if exploited. To deal with all this, we propose a novel architecture for query optimization, in which physical optimization is leveraged at the level of query rewriting. As a consequence, the other important aspect of query optimization, semantic optimization (that takes advantage of the redundancy at the logical level), can be naturally incorporated. The optimizer can then make global decisions based on both semantic and physical knowledge, leading to plans of higher quality than those obtainable by a traditional two-level approach. The main idea

