• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources (1997)

by Mary Tork Roth
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 246
Next 10 →

The state of the art in distributed query processing

by Donald Kossmann - ACM Computing Surveys , 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract - Cited by 320 (3 self) - Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1

Optimizing Queries across Diverse Data Sources

by Laura M. Haas, Donald Kossmann, Edward L. Wimmers, Jun Yang - In Proc. of VLDB , 1997
"... Businesses today need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query optimizer for Gar- lic [C+95], a middleware system designed to integrate data from a broad range of data sources with ver ..."
Abstract - Cited by 284 (15 self) - Add to MetaCart
Businesses today need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query optimizer for Gar- lic [C+95], a middleware system designed to integrate data from a broad range of data sources with very different query capabilities. Garlic's optimizer extends the rule-based approach of [Loh88 ] to work in a heterogeneous environment, by defining generic rules for the middleware and using wrapper-provided rules to encapsulate the capabilities of each data source. This approach offers great advantages in terms of plan quality, extensibility to new sources, incremental implementation of rules for new sources, and the ability to express the capabilities of a diverse set of sources. We describe the design and implementation of this optimizer, and illustrate its actions through an example.
(Show Context)

Citation Context

...collections which are the targets of queries in Garlic. The wrapper further provides a description of its query processing capabilities in the form of a set of rules (encapsulated as planning methods =-=[RS97]-=-). Different sources may vary greatly in their query processing capabilities, and thus will provide different rules. A wrapper does not have to reflect the full query functionality of its data sources...

Data Cleaning: Problems and Current Approaches

by Erhard Rahm, Hong Hai Do - IEEE Data Engineering Bulletin , 2000
"... We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouse ..."
Abstract - Cited by 279 (8 self) - Add to MetaCart
We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool support for data cleaning. 1
(Show Context)

Citation Context

...based information systems face data transformation steps similar to those of data warehouses. In particular, there is typically a wrapper per data source for extraction and a mediator for integration =-=[32, 31]-=-. So far, these systems provide only limited support for data cleaning, focusing Copyright 2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material...

Semantic Integration of Semistructured and Structured Data Sources

by S. Bergamaschi, S. Castano, M. Vincini - SIGMOD Record , 1999
"... this paper is to describe the MOMIS [4, 5] (Mediator envirOnment for Multiple Information Sources) approach to the integration and query of multiple, heterogeneous information sources, containing structured and semistructured data. MOMIS has been conceived as a joint collaboration between University ..."
Abstract - Cited by 166 (19 self) - Add to MetaCart
this paper is to describe the MOMIS [4, 5] (Mediator envirOnment for Multiple Information Sources) approach to the integration and query of multiple, heterogeneous information sources, containing structured and semistructured data. MOMIS has been conceived as a joint collaboration between University of Milano and Modena in the framework of the INTERDATA national research project, aiming at providing methods and tools for data management in Internet-based information systems. Like other integration projects [1, 10, 14], MOMIS follows a "semantic approach" to information integration based on the conceptual schema, or metadata, of the information sources, and on the following architectural elements: i) a common object-oriented data model, defined according to the ODL I 3 language, to describe source schemas for integration purposes. The data model and ODL I 3 have been defined in MOMIS as subset of the ODMG-93 ones, following the proposal for a standard mediator language developed by the I
(Show Context)

Citation Context

...nd Modena in the framework of the INTERDATA national research project, aiming at providing methods and tools for data management in Internet-based information systems. Like other integration projects =-=[1, 10, 14], MOMIS fo-=-llows a "semantic approach" to information integration based on the conceptual schema, or metadata, of the information sources, and on the following architectural elements: i) a common objec...

The Clio Project: Managing Heterogeneity.

by Renke J Miller' , Mauricio A Hernbndez2 , Laura M Haas2 , Lingling Yan2 , C T Howard Ho2 , Ronald Fagin2 , Lucian Popa2 - In SIGMOD Record, , 2001
"... Abstract Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we presen ..."
Abstract - Cited by 143 (3 self) - Add to MetaCart
Abstract Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we present the underlying themes of our approach and present a brief case study.
(Show Context)

Citation Context

...oading one or more schemas into the system. These schemas are read from either an underlying Object-Relational database, a legacy source that has been wrapped with a Garlic Object3 Relational wrapper =-=[TS97]-=-, or from an XML le with an associated XML schema. The schemas may be legacy schemas or they may include an integrated schema produced manually or by anintegration tool. The schema engine is used to a...

SHOE: A Knowledge Representation Language for Internet Applications

by Jeff Heflin, James Hendler, Sean Luke , 1999
"... It is our contention that the World Wide Web poses challenges to knowledge representation systems that fundamentally change the way we should design KR languages. In this paper, we describe the Simple HTML Ontology Extensions (SHOE), a KR language which allows web pages to be annotated with seman ..."
Abstract - Cited by 99 (2 self) - Add to MetaCart
It is our contention that the World Wide Web poses challenges to knowledge representation systems that fundamentally change the way we should design KR languages. In this paper, we describe the Simple HTML Ontology Extensions (SHOE), a KR language which allows web pages to be annotated with semantics. We present a formalism for the language and discuss the features which make it well suited for the Web. We describe the syntax and semantics of this language, and discuss the differences from traditional KR systems that make it more suited to modern web applications. We also describe some generic tools for using the language and demonstrate its capabilities by describing two prototype systems that use it. We also discuss some future tools currently being developed for the language. The language, tools, and details of the applications are all available on the World Wide Web at http://www.cs.umd.edu/projects/plus/SHOE. 1 Introduction One of the venerable sub-fields of artificial ...
(Show Context)

Citation Context

...or phrases, and thus suffer from the limitations of keyword search. Another approach involves mediators (or wrappers), custom software that serves as an interface between middleware and a data source =-=[27, 21, 23]-=-. When applied to the Web, wrappers allow users to query a page's contents as if it was a database. However, the heterogeneity of the Web requires that a multitude of custom wrappers must be developed...

Data Integration by Bi-Directional Schema Transformation Rules

by Peter McBrien, Alexandra Poulovassilis , 2003
"... In this paper we describe a new approach to data integration which subsumes the previous approaches of local as view (LAV) and global as view (GAV). Our method, which we term both as view (BAV), is based on the use of reversible schema transformation sequences. We show how LAV and GAV view definitio ..."
Abstract - Cited by 89 (13 self) - Add to MetaCart
In this paper we describe a new approach to data integration which subsumes the previous approaches of local as view (LAV) and global as view (GAV). Our method, which we term both as view (BAV), is based on the use of reversible schema transformation sequences. We show how LAV and GAV view definitions can be fully derived from BA V schema transformation sequences, and how BA V transformation sequences may be partially derived from LAV or GAV view definitions. We also show how BAV supports the evolution of both global and local schemas, and we discuss ongoing implementation of the BA V approach within the AutoMed project.

Answering XML Queries over Heterogeneous Data Sources

by Ioana Manolescu, Daniela Florescu, Donald Kossmann , 2001
"... This work describes an architecture for integrating heterogeneous data sources under an XML global schema, following the local-as-view approach (local sources' schemas are described as views over the global schema). In this context, we focus on the problem of translating the user's query a ..."
Abstract - Cited by 88 (2 self) - Add to MetaCart
This work describes an architecture for integrating heterogeneous data sources under an XML global schema, following the local-as-view approach (local sources' schemas are described as views over the global schema). In this context, we focus on the problem of translating the user's query against the XML global schema into a SQL query over the local data sources.

Describing and using query capabilities of heterogeneous sources

by Vasilis Vassalos, Yannis Papakonstantinou , 1997
"... Information integration systems have to cope with the different and limited query interfaces of the underlying information sources. First, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Second, the integration sy ..."
Abstract - Cited by 87 (9 self) - Add to MetaCart
Information integration systems have to cope with the different and limited query interfaces of the underlying information sources. First, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Second, the integration systems need algo-rithms for deciding how a query can be an-swered given the capabilities of the sources. Third, they need to translate a query into the format that the source understands. We present two languages suitable for descrip-tions of query capabilities of sources and com-pare their expressive power. We also de-scribe algorithms for deciding whether a query “matches ” the description and show their ap-plication to the problem of translating user queries into source-specific queries and com-mands. Finally, we propose new improved al-gorithms for the problem of answering queries using these descriptions. 1

Scaling access to heterogeneous data sources with DISCO

by Anthony Tomasic, Louiqa Raschid, Patrick Valduriez - IEEE Transactions on Knowledge and Data Engineering , 1998
"... Abstract | Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be signi cantly changed to incorporate a new data source. When implementing trans ..."
Abstract - Cited by 86 (3 self) - Add to MetaCart
Abstract | Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be signi cantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database implementors must deal with data sources that do not support all the functionality required by mediators. Application programmers must deal with graceless failures for unavailable data sources. Queries simply return failure and no further information when data sources are unavailable for query processing. The Distributed Information Search COmponent (Disco) addresses these problems. Data modeling techniques manage the connections to data sources, and sources can be added transparently to the users and applications. The interface between mediators and data sources exibly handles di erent query languages and different data source functionality. Query rewriting and optimization techniques rewrite queries so they are e ciently evaluated by sources. Query processing and evaluation semantics are developed to process queries over unavailable data sources. In this article we describe (a) the distributed mediator architecture of Disco � (b) the data model and its modeling of data source connections � (c) the interface to underlying data sources and the query rewriting process � and (d) query processing semantics. We describe several advantages of our system.
(Show Context)

Citation Context

...ying capability of di erent data sources, and propose techniques for query reformulation that resolve this mismatch. These techniques are not incorporated into the data model. The Garlic system [36], =-=[37]-=-, [38], research described in [39], [40], [30] and the Information Manifold project [41],s[42], [43], all assume a mediator environment based on a common data model. In the Information Manifold projec...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University