Results 1 - 10
of
14
Representing and reasoning about semantic conflicts in heterogeneous information systems
, 1997
"... ..."
Accessing Heterogeneous Data Through Homogenization and Integration Mediators
, 1997
"... The AURORA mediator system employs a novel 2-tier, plug-and-play mediation model that is designed to facilitate access to a large number of heterogeneous data sources. This paper describes AURORA's mediation model and a suite of techniques used by a specific AURORA mediator, AURORA-RH. This suite in ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
The AURORA mediator system employs a novel 2-tier, plug-and-play mediation model that is designed to facilitate access to a large number of heterogeneous data sources. This paper describes AURORA's mediation model and a suite of techniques used by a specific AURORA mediator, AURORA-RH. This suite includes a mediation methodology provided via an interactive mediator author's toolkit (MAT), a mediation enabling algebra, a query rewriting algorithm, and transformation rules that facilitate query optimization. 1. Introduction The advent of the Internet gives rise to new types of applications such as electronic commerce and virtual enterprise that require integrated access to large number of heterogeneous data sources around the globe. Much is known about schema integration but the impact of this process on query processing efficiency is seldom discussed. In the Internet age, it is crucial that this impact be considered. The AURORA project builds integrated access to heterogeneous data sou...
Building Regression Cost Models for Multidatabase Systems
, 1996
"... A major challenge for performing global query optimization in a multidatabase system (MDBS) is the lack of cost models for local database systems at the global level. In this paper we present a statistical procedure based on multiple regression analysis for building cost models for local database sy ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
A major challenge for performing global query optimization in a multidatabase system (MDBS) is the lack of cost models for local database systems at the global level. In this paper we present a statistical procedure based on multiple regression analysis for building cost models for local database systems in an MDBS. Explanatory variables that can be included in a regression model are identified and a mixed forward and backward method for selecting significant explanatory variables is presented. Measures for developing useful regression cost models, such as removing outliers, eliminating multicollinearity, validating regression model assumptions, and checking significance of regression models, are discussed. Experimental results demonstrate that the presented statistical procedure can develop useful local cost models in an MDBS.
Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems
- Distributed and Parallel Databases
, 1998
"... . To meet users' growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optim ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
. To meet users' growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optimization should be performed to achieve good system performance. There are a number of new challenges for global query optimization in an MDBS. Among them, a major one is that some local optimization information, such as local cost parameters, may not be available at the global level because of local autonomy. It creates difficulties for finding a good decomposition of a global query during query optimization. To tackle this challenge, a new query sampling method is proposed in this paper. The idea is to group component queries into homogeneous classes, draw a sample of queries from each class, and use observed costs of sample queries to derive a cost formula for each class by multiple regres...
Multidatabase Query Optimization
- Distributed and Parallel Databases
, 1997
"... . A multidatabase system (MDBS) allows the users to simultaneously access heterogeneous, and autonomous databases using an integrated schema and a single global query language. The query optimization problem in MDBSs is quite different from the query optimization problem in distributed homogeneous d ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
. A multidatabase system (MDBS) allows the users to simultaneously access heterogeneous, and autonomous databases using an integrated schema and a single global query language. The query optimization problem in MDBSs is quite different from the query optimization problem in distributed homogeneous databases due to schema heterogeneity and autonomy of local database systems. In this work, we consider the optimization of query distribution in case of data replication and the optimization of intersite joins, that is, the join of the results returned by the local sites in response to the global subqueries. The algorithms presented for the optimization of intersite joins try to maximize the parallelism in execution and take the federated nature of the problem into account. It has also been shown through a comparative performance study that the proposed intersite join optimization algorithms are efficient. The approach presented can easily be generalized to any operation required for intersi...
An Integrated Method for Estimating Selectivities in a Multidatabase System
- In Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research
, 1993
"... A multidatabase system (MDBS) integrates information from autonomous local databases managed by different database management systems (MDBS) in a distributed environment. A number of challenges are raised for query optimization in such an MDBS. One of the major challenges is that some local optimiza ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
A multidatabase system (MDBS) integrates information from autonomous local databases managed by different database management systems (MDBS) in a distributed environment. A number of challenges are raised for query optimization in such an MDBS. One of the major challenges is that some local optimization information may not be available at the global level. We recently proposed a query sampling method to drive cost estimation formulas for local databases in an MDBS [22] . To use the derived formulas to estimate the costs of queries, we need to know the selectivities of the qualifications of the queries. Unfortunately, existing methods for estimating selectivities cannot be used efficiently in an MDBS environment. This paper discusses difficulties of estimating selectivities in an MDBS. Based on the discussion, this paper presents an integrated method to estimate selectivities in an MDBS. The method integrates and extends several existing methods so that they can be used in an MDBS eff...
An Optimal Cache for a Federated Database System
, 1997
"... Federated database systems allow users to query different autonomous databases with a single request. The answer to those requests must be found on the underlying databases. This answering process can be improved if some data are cached within the federated database system. The article presents an a ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Federated database systems allow users to query different autonomous databases with a single request. The answer to those requests must be found on the underlying databases. This answering process can be improved if some data are cached within the federated database system. The article presents an approach that allows the definition of an optimal cache for a federated database system according to a set of parameters. We show the types of objects to be cached, the cost model used to decide which ones are worth caching and the method to find the optimal set of objects to cache. Moreover, this approach continuously updates the set of parameter values and periodically redefines the optimal cache in order to reflect changes in the user requirements or in the implementation features of the underlying databases. The article also presents how cached data can be used to answer a user query. Furthermore, the advantages of using a Knowledge Representation System based on Description Logics in o...
Ontologies, Contexts, and Mediation: Representing and Reasoning about Semantic Conflicts . . .
- CONFLICTS IN HETEROGENEOUS AND AUTONOMOUS SYSTEMS, SLOAN SCHOOL OF MANAGEMENT, WORKING PAPER #3848; ALSO CISL WORKING PAPER CISL
, 1995
"... The Context Interchange strategy has been proposed as an approach for achieving interoperability among heterogeneous and autonomous data sources and receivers [25]. We have suggested [10] that this strategy has many advantages over traditional loose- and tight-coupling approaches. In this paper, we ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The Context Interchange strategy has been proposed as an approach for achieving interoperability among heterogeneous and autonomous data sources and receivers [25]. We have suggested [10] that this strategy has many advantages over traditional loose- and tight-coupling approaches. In this paper, we present an underlying theory describing how those features can be realized by showing (1) how domain and context specific knowledge can be represented and organized for maximal sharing; and (2) how these bodies of knowledge can be used to facilitate the detection and resolution of semantic conflicts between different sys-tems. Within this framework, ontologies exist as conceptualizations of particular domains and contexts as "idiosyncratic" constraints on these shared conceptualizations. In adopting a clausal representation for ontologies and contexts, we show that these have an elegant logical interpretation which provides a unifying framework for context mediation: i.e., the detection and resolution of semantic conflicts. The practicality of this approach is exemplified through a description of a prototype implementation of a context interchange system which takes advantage of an existing information infrastructure (the World Wide Web) for
The CORDS Multidatabase Project
, 1995
"... In virtually every organization, data is stored in a variety of ways and managed by different database and file systems. Applications that require data from multiple sources are complex because they must be aware of and deal with the specifics of each data source. They must also perform any data int ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
In virtually every organization, data is stored in a variety of ways and managed by different database and file systems. Applications that require data from multiple sources are complex because they must be aware of and deal with the specifics of each data source. They must also perform any data integration needed, for example, joining data from multiple sources. The objective of a multidatabase system is to provide application developers and end users with an integrated view of and a uniform interface to all the required data. The view and the interface should be independent of where the data is stored and how it is managed. cords is a research project focussed on distributed applications. It is a collaborative effort involving ibm and several universities. As part of this project, we are designing and prototyping a multidatabase system. This paper provides an overview of its architecture and describes the approach taken in the following areas: management of catalog information, sch...
Performance Analysis of Several Algorithms for Processing Joins between Textual Attributes
, 1996
"... Three algorithms for processing joins on attributes of textual type are presented and analyzed in this paper. Since such joins often involve document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the document ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Three algorithms for processing joins on attributes of textual type are presented and analyzed in this paper. Since such joins often involve document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. Keywords: Query processing, textual database, join, multidatabase 1. Introduction Researches in multidatabase system have been intensified in recent years [1,2,6,7,9,10,13,15]. In this paper, we consider a multidatabase system that contains both local systems that manage structured data (e.g., relational...

