Results 1 - 10
of
59
The TSIMMIS Approach to Mediation: Data Models and Languages
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 1997
"... TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically th ..."
Abstract
-
Cited by 344 (8 self)
- Add to MetaCart
TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically the components that are needed to build systems for integrating information. In this paper we shall discuss the principal architectural features and their rationale.
Optimizing Queries across Diverse Data Sources
- In Proc. of VLDB
, 1997
"... Businesses today need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query optimizer for Gar- lic [C+95], a middleware system designed to integrate data from a broad range of data sources with ver ..."
Abstract
-
Cited by 241 (15 self)
- Add to MetaCart
Businesses today need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query optimizer for Gar- lic [C+95], a middleware system designed to integrate data from a broad range of data sources with very different query capabilities. Garlic's optimizer extends the rule-based approach of [Loh88 ] to work in a heterogeneous environment, by defining generic rules for the middleware and using wrapper-provided rules to encapsulate the capabilities of each data source. This approach offers great advantages in terms of plan quality, extensibility to new sources, incremental implementation of rules for new sources, and the ability to express the capabilities of a diverse set of sources. We describe the design and implementation of this optimizer, and illustrate its actions through an example.
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
MedMaker: A Mediation System Based on Declarative Specifications
- INTERNATIONAL CONFERENCE ON DATA ENGINEERING
, 1996
"... Mediators are used for integration of heterogeneous information sources. We present a system for declaratively specifying mediators. It is targeted for integration of sources with unstructured or semi-structured data and/or sources with changing schemas. We illustrate the main features of the Mediat ..."
Abstract
-
Cited by 120 (17 self)
- Add to MetaCart
Mediators are used for integration of heterogeneous information sources. We present a system for declaratively specifying mediators. It is targeted for integration of sources with unstructured or semi-structured data and/or sources with changing schemas. We illustrate the main features of the Mediator Specification Language (MSL), show how they facilitate integration, and describe the implementation of the system that interprets the MSL specifications.
Describing and using query capabilities of heterogeneous sources
, 1997
"... Information integration systems have to cope with the different and limited query interfaces of the underlying information sources. First, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Second, the integration sy ..."
Abstract
-
Cited by 75 (7 self)
- Add to MetaCart
Information integration systems have to cope with the different and limited query interfaces of the underlying information sources. First, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Second, the integration systems need algo-rithms for deciding how a query can be an-swered given the capabilities of the sources. Third, they need to translate a query into the format that the source understands. We present two languages suitable for descrip-tions of query capabilities of sources and com-pare their expressive power. We also de-scribe algorithms for deciding whether a query “matches ” the description and show their ap-plication to the problem of translating user queries into source-specific queries and com-mands. Finally, we propose new improved al-gorithms for the problem of answering queries using these descriptions. 1
Query Rewriting for Semistructured Data
"... We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q ..."
Abstract
-
Cited by 60 (9 self)
- Add to MetaCart
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and query composition- techniques that were developed for structured, relational data. We also develop an algorithm for equivalence checking of TSL queries. We show that the algorithm is sound and complete for TSL, i.e., it always finds every non-trivial TSL rewriting query of q, and we discuss its complexity. We extend the rewriting algorithm to use some forms of structural constraints (such as DTDs) and find more opportunities for query rewriting.
Mind Your Vocabulary: Query Mapping Across Heterogeneous Information Sources
, 1999
"... In this paper we present a mechanism for translating constraint queries, i.e., Boolean expressions of constraints, across heterogeneous information sources. Integrating such systems is difficult in part because they use a wide range of constraints as the vocabulary for formulating queries. We desc ..."
Abstract
-
Cited by 52 (7 self)
- Add to MetaCart
In this paper we present a mechanism for translating constraint queries, i.e., Boolean expressions of constraints, across heterogeneous information sources. Integrating such systems is difficult in part because they use a wide range of constraints as the vocabulary for formulating queries. We describe algorithms that apply user-provided mapping rules to translate query constraints into ones that are understood and supported in another context, e.g., that use the proper operators and value formats. We show that the translated queries minimally subsume the original ones. Furthermore, the translated queries are also the most compact possible. Unlike other query mapping work, we effectively consider inter-dependencies among constraints, i.e., we handle constraints that cannot be translated independently. Furthermore, when constraints are not fully supported, our framework explores relaxations (semantic rewritings) into the closest supported version. Our most sophisticated algorith...
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System
- In Proc. of the Conf. on Very Large Data Bases (VLDB
"... : An important issue for federated systems of diverse data sources is how to optimize cross-source queries, without building knowledge of individual sources into the optimizer. Garlic is a federated system with an emphasis on extensibility and diverse sources. To achieve these goals, data sources ar ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
: An important issue for federated systems of diverse data sources is how to optimize cross-source queries, without building knowledge of individual sources into the optimizer. Garlic is a federated system with an emphasis on extensibility and diverse sources. To achieve these goals, data sources are attached to Garlic by means of a wrapper. Wrappers participate in query planning, telling Garlic what parts of a query a data source can do and how much it will cost. This paper describes a framework through which wrappers provide the necessary cost and cardinality information for optimization, and the facilities Garlic provides to make this task easier. Our framework makes it easy for wrappers to provide cost information, requires few changes to a conventional bottomup optimizer and is easily extensible to a broad range of sources. We believe that our framework for costing is the first to allow accurate cost estimates for diverse sources within the context of a traditional cost-based opti...
Computing Capabilities of Mediators
"... Existing data-integration systems based on the mediation architecture employ avariety of mechanisms to describe the query-processing capabilities of sources. However, these systems do not compute the capabilities of the mediators based on the capabilities of the sources they integrate. In this paper ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
Existing data-integration systems based on the mediation architecture employ avariety of mechanisms to describe the query-processing capabilities of sources. However, these systems do not compute the capabilities of the mediators based on the capabilities of the sources they integrate. In this paper, we propose a framework to capture a rich variety of query-processing capabilities of data sources and mediators. We present algorithms to compute the set of supported queries of a mediator, based on the capability limitations of its sources. Our algorithms take into consideration a variety of query-processing techniques employed by mediators to enhance the set of supported queries.
Capability Based Mediation in TSIMMIS
- In Proc. of ACM SIGMOD
, 1998
"... Introduction The TSIMMIS system [1] integrates data from multiple heterogeneous sources and provides users with seamless integrated views of the data. It translates a user query on the integrated views into a set of source queries and postprocessing steps that compute the answer to the user query f ..."
Abstract
-
Cited by 43 (10 self)
- Add to MetaCart
Introduction The TSIMMIS system [1] integrates data from multiple heterogeneous sources and provides users with seamless integrated views of the data. It translates a user query on the integrated views into a set of source queries and postprocessing steps that compute the answer to the user query from the results of the source queries. TSIMMIS uses a mediation architecture [11] to accomplish this (Figure 1). User Source 1 Source 2 Source N Mediator Figure 1: The TSIMMIS Architecture Many other data integration systems like Garlic [2, 9] and Information Manifold [4] employ a similar architecture. One of the distinguishing features of TSIMMIS is its use of a semi-structured data model (called the Object Exchange Model or OEM [7]) for dealing with the heterogeneity of the data sources. In particular, it employs source wrappers [3] that provide a uniform OEM interface to the mediator. In SIGMO

