Results 1 - 10
of
41
Using Schema Matching to Simplify Heterogeneous Data Translation
, 1998
"... A broad spectrum of data is available on the Web in distinct heterogeneous sources, and stored under different formats. As the number of systems that utilize this heterogeneous data grows, the importance of data translation and conversion mechanisms increases greatly. In this paper we present a n ..."
Abstract
-
Cited by 187 (5 self)
- Add to MetaCart
A broad spectrum of data is available on the Web in distinct heterogeneous sources, and stored under different formats. As the number of systems that utilize this heterogeneous data grows, the importance of data translation and conversion mechanisms increases greatly. In this paper we present a new translation system, based on schemamatching, aimed to simplify the intricate task of data conversion. We observe that in many cases the schema of the data in the source system is very similar to the that of the target system. In such cases, much of the translation work can be done automatically, based on the schemas similarity. This saves a lot of effort for the user, limiting the amount of programming needed. We define common schema and data models, in which schemas and data (resp.) from many common models can be represented. Using a rulebased method, the source schema is compared with the target one, and each component in the source schema is matched with a corresponding compone...
Object fusion in mediator systems
- INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES
, 1996
"... One of the main tasks of mediators is to fuse information from heterogeneous information sources. This may involve, for example, removing redundancies, and resolving inconsistencies in favor of the most reliable source. The problem becomes harder when the sources are unstructured/semistructured and ..."
Abstract
-
Cited by 155 (29 self)
- Add to MetaCart
One of the main tasks of mediators is to fuse information from heterogeneous information sources. This may involve, for example, removing redundancies, and resolving inconsistencies in favor of the most reliable source. The problem becomes harder when the sources are unstructured/semistructured and we do not have complete knowledge of their contents and structure. In this paper we show how many common fusion operations can be specified non-procedurally and succinctly. The key to our approach is to assign semantically meaningful object ids to objects as they are "imported " into the mediator.
The Use of Information Capacity in Schema Integration and Translation
- In VLDB
, 1993
"... In this paper, we carefully explore the assumptions behind using information capacity equivalence as a measure of correctness for judging transformed schemas in schema integration and translation methodologies. We present a classification of common integration and translation tasks based on their op ..."
Abstract
-
Cited by 67 (9 self)
- Add to MetaCart
In this paper, we carefully explore the assumptions behind using information capacity equivalence as a measure of correctness for judging transformed schemas in schema integration and translation methodologies. We present a classification of common integration and translation tasks based on their operational goals and derive from them the relative information capacity requirements of the original and transformed schemas. We show that for many tasks, information capacity equivalence of the schemas is not strictly required. Based on this, we present a new definition of correctness that reflects each undertaken task. We then examine existing methodologies and show how anomalies can arise when using those that do not meet the proposed correctness criteria. 1 Introduction Formal work on schema equivalence has largely been ignored within practical schema integration and translation tools. Practitioners have felt that theoretical work is too narrow in scope to be applicable to the problems ...
A General Formal Framework for Schema Transformation
- DATA AND KNOWLEDGE ENGINEERING
, 1998
"... Several methodologies for integrating database schemas have been proposed in the literature, using various common data models (CDMs). As part of these methodologies transformations have been defined that map between schemas which are in some sense equivalent. This paper describes a general framework ..."
Abstract
-
Cited by 67 (17 self)
- Add to MetaCart
Several methodologies for integrating database schemas have been proposed in the literature, using various common data models (CDMs). As part of these methodologies transformations have been defined that map between schemas which are in some sense equivalent. This paper describes a general framework for formally underpinning the schema transformation process. Our formalism clearly identifies which transformations apply for any instance of the schema and which only for certain instances. We illustrate the applicability of the framework by showing how to define a set of primitive transformations for an extended ER model and by defining some of the common schema transformations as sequences of these primitive transformations. The same approach could be used to formally define transformations on other CDMs.
Challenges in Integrating Biological Data Sources
- Journal of Computational Biology
, 1995
"... this report, we examine the technical challenges to integration, critique the available tools and resources, and compare the cost and advantages of various methodologies. We begin by analyzing the basic steps in strict and complete integration: 1) transformation of the various schemas to a common da ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
this report, we examine the technical challenges to integration, critique the available tools and resources, and compare the cost and advantages of various methodologies. We begin by analyzing the basic steps in strict and complete integration: 1) transformation of the various schemas to a common data model; 2) matching of semantically related schema objects; 3) schema integration; 4) transformation of data to the federated database on demand; and 5) matching of semantically equivalent data. Some progress has been made on generic problems such as (1) and (3) within the wider database community, but issues of semantics (steps (2) and (5)) have only been dealt with any degree of success by domain experts within the biological community. We then look at the solution space of integration strategies as defined by two axes, the "tightness" of federation and the "degree" of instantiation, discuss where various solutions fall on this plane, and examine their cost and advantages/disadvantages. Finally, we examine technical challenges that are not -3- July 12, 1995
Computing Capabilities of Mediators
"... Existing data-integration systems based on the mediation architecture employ avariety of mechanisms to describe the query-processing capabilities of sources. However, these systems do not compute the capabilities of the mediators based on the capabilities of the sources they integrate. In this paper ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
Existing data-integration systems based on the mediation architecture employ avariety of mechanisms to describe the query-processing capabilities of sources. However, these systems do not compute the capabilities of the mediators based on the capabilities of the sources they integrate. In this paper, we propose a framework to capture a rich variety of query-processing capabilities of data sources and mediators. We present algorithms to compute the set of supported queries of a mediator, based on the capability limitations of its sources. Our algorithms take into consideration a variety of query-processing techniques employed by mediators to enhance the set of supported queries.
WOL: A Language for Database Transformations and Constraints
- In IEEE Int. Conf. on Data Engineering
, 1997
"... The need to transform data between heterogeneous databases arises from a number of critical tasks in data management. These tasks are complicated by schema evolution in the underlying databases, and by the presence of non-standard database constraints. We describe a declarative language, WOL, for sp ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
The need to transform data between heterogeneous databases arises from a number of critical tasks in data management. These tasks are complicated by schema evolution in the underlying databases, and by the presence of non-standard database constraints. We describe a declarative language, WOL, for specifying such transformations, and its implementation in a system called Morphase. WOL is designed to allow transformations between the complex data structures which arise in object-oriented databases as well as in complex relational databases, and to allow for reasoning about the interactions between database transformations and constraints. integrating the US Cities-and-States and European-Citiesand-Countries databases shown in Figures 1 and 2. The graphical notation used here is inspired by [2]: the boxes represent classes which are finite sets of objects; the arrows represent attributes, or functions on classes. name str
A Formalisation of Semantic Schema Integration
- INFORMATION SYSTEMS
, 1998
"... Several methodologies for the semantic integration of databases have been proposed in the literature. These often use a variant of the Entity-Relationship (ER) model as the common data model. To aid the schema conforming, merging and restructuring phases of the semantic integration process, various ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
Several methodologies for the semantic integration of databases have been proposed in the literature. These often use a variant of the Entity-Relationship (ER) model as the common data model. To aid the schema conforming, merging and restructuring phases of the semantic integration process, various transformations have been defined that map between ER representations which are in some sense equivalent. Our work aims to formalise the notion of schema equivalence and to provide a formal underpinning for the schema integration process. We show how transformational, mapping and behavioural schema equivalence are all variants of a more general definition of schema equivalence. We propose a semantically sound set of primitive transformations and show how they can be used to express the transformations commonly used during the schema integration process and to define new transformations. We differentiate between transformations which apply to any instance of a schema and those whic...
Model-independent schema and data translation
- In EDBT, volume 3896 of LNCS
, 2006
"... Abstract. We describe MIDST, an implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from OO to SQL or from SQL to XSD. It extends past approaches by translating database instances, not just their schemas. The operator can be used ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Abstract. We describe MIDST, an implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from OO to SQL or from SQL to XSD. It extends past approaches by translating database instances, not just their schemas. The operator can be used to generate database wrappers (e.g. OO or XML to relational), default user interfaces (e.g. relational to forms), or default database schemas from other representations. The approach translates both schemas and data: given a source instance I of a schema S expressed in a source model, and a target model TM, it generates a schema S ′ expressed in TM that is “equivalent ” to S and an instance I ′ of S ′ “equivalent ” to I. A wide family of models is handled by using a metamodel in which models can be succinctly and precisely described. The approach expresses the translation as Datalog rules and exposes the source and target of the translation in a generic relational dictionary. This makes the translation transparent, easy to customize and model-independent. 1
A Formal Framework for ER Schema Transformation
, 1997
"... Several methodologies for semantic schema integration have been proposed in the literature, often using some variant of the ER model as the common data model. As part of these methodologies, various transformations have been defined that map between ER schemas which are in some sense equivalent. Thi ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Several methodologies for semantic schema integration have been proposed in the literature, often using some variant of the ER model as the common data model. As part of these methodologies, various transformations have been defined that map between ER schemas which are in some sense equivalent. This paper gives a unifying formalisation of the ER schema transformation process and shows how some common schema transformations can be expressed within this single framework. Our formalism clearly identifies which transformations apply for any instance of the schema and which only for certain instances.

