• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Query processing in heterogeneous information sources (1996)

by Y Papakonstantinou
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Information integration using logical views

by Jeffrey D. Ullman , 1997
"... Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conju ..."
Abstract - Cited by 395 (4 self) - Add to MetaCart
Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conjunctive queries and/or Datalog programs. Then we com-pare the approaches taken by AT&T Labs ' "Information Manifold " and the Stanford "Tsimmis " project in these terms. 1 Theoretical Background Before addressing information-integration issues, let us review some of the basic ideas concerning conjunctive queries, Datalog programs, and their containment. To begin, we use the logical rule notation from [Ull88]. Example 1. The following: p(X,Z):- a(X,Y) & a(Y,Z). is a rule that talks about a, an EDB predicate ("Extensional DataBase, " or stored relation), and p, an IDB predicate ("Intensional DataBase, " or predicate whose relation is constructed by rules). In this and several other examples, it is useful to think of a as an "arc " predicate defining a graph, while other predicates define certain structures that might exist in the graph. That is, a(X, Y) means there is an arc from node X to node Y. In this case, the rule says "p(X, Z) is true if there is an arc from node X to node Y and also an arc from Y to Z." That is, p represents paths of length 2. In general, there is one atom, the head, on the left of the "if " sign,:- and zero of more atoms, called subgoals, on the right side (the body). The head always has an IDB predicate; the subgoals can have IDB or EDB predicates. Thus, here p(X, Z) is the head, while a(X, Y) and a(Y, Z) are subgoals. We assume that each variable appearing in the head also appears somewhere in the body. This "safety " requirement assures that when we use a rule, we are not left with undefined variables in the head when we try to infer a fact about the head's predicate. We also assume that atoms consist of a predicate and zero or more arguments. An argument can be either a variable or a constant. However, we exclude function symbols from arguments.

Query Rewriting for Semistructured Data

by Yannis Papakonstantinou, Vasilis Vassalos
"... We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q ..."
Abstract - Cited by 60 (9 self) - Add to MetaCart
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and query composition- techniques that were developed for structured, relational data. We also develop an algorithm for equivalence checking of TSL queries. We show that the algorithm is sound and complete for TSL, i.e., it always finds every non-trivial TSL rewriting query of q, and we discuss its complexity. We extend the rewriting algorithm to use some forms of structural constraints (such as DTDs) and find more opportunities for query rewriting.

Fusion Queries over Internet Databases

by Ramana Yerneni, Yannis Papakonstantinou, Serge Abiteboul, Hector Garcia-molina
"... . Fusion queries search for information integrated from distributed, autonomous sources over the Internet. We investigate techniques for efficient processing of fusion queries. First, we focus on a very wide class of query plans that capture the spirit of many techniques usually considered in existi ..."
Abstract - Cited by 17 (0 self) - Add to MetaCart
. Fusion queries search for information integrated from distributed, autonomous sources over the Internet. We investigate techniques for efficient processing of fusion queries. First, we focus on a very wide class of query plans that capture the spirit of many techniques usually considered in existing systems. We show how to efficiently find good query plans within this large class. We provide additional heuristics that, by considering plans outside our target class of plans, yield further performance improvements. 1 Introduction In distributed information systems on the Internet, data sources often provide incomplete and overlapping information on a set of entities. A fusion query searches over these entities, looking for ones that satisfy given conditions. To illustrate, consider databases operated by the Departments of Motor Vehicles (DMVs) of several states. Conceptually, each state database can be thought of as a relation R i with the following attributes, among others: Driver's...

Query Rewriting using Semistructured Views

by Yannis Papakonstantinou, Vasilis Vassalos - In Proc. of the ACM SIGMOD Int. Conf. on Management of Data , 1998
"... We address the problem of query rewriting for MSL, a semistructured language developed at Stanford in the TSIMMIS project for information integration. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
We address the problem of query rewriting for MSL, a semistructured language developed at Stanford in the TSIMMIS project for information integration. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and unification -- techniques which were developed for structured, relational data. At the same time we develop an algorithm for equivalence checking of MSL queries. We show that the rewriting algorithm is sound and complete, i.e., it always finds every conjunctive MSL rewriting query of q, and we discuss its complexity. We currently incorporate the algorithm in the TSIMMIS system. 1 Introduction Recently, many semistructured data models, query and view definition languages have been proposed [GM + 97, MAG + 97, BDHS96, AV97a, MM97, KS95, PGMU96,...

The Model-Theoretic Semantics of TRIPLE

by Stefan Decker, Michael Sintek, Wolfgang Nejdl , 2002
"... In this paper we investigate and formalize the notion of RDF Models (or context) for RDF data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with RDF and contexts. A unique feature of the language is that one context can be used as parameters for other contexts. T ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
In this paper we investigate and formalize the notion of RDF Models (or context) for RDF data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with RDF and contexts. A unique feature of the language is that one context can be used as parameters for other contexts. The definitions are used to derive a query and reasoning language based on deductive database technology. Applications include querying of data with respect to multiple semantics and information integration.

Fusion Queries over Internet Databases

by Serge Abiteboul, Hector Garcia-molina, Yannis Papakonstantinou, Ramana Yerneni - In Proc. of the Conf. on Extending Database Technology (EDBT , 1997
"... Fusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate techniques for efficie ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Fusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate techniques for efficient processing of fusion queries. First, we focus on a very wide class of query plans that capture the spirit of many techniques usually considered in existing systems. We show how to efficiently find, under various realistic scenarios, good query plans within this large class. We evaluate the performance of these plans and provide additional heuristics that, by considering plans outside our target class of plans, yield further performance improvements. 1 Introduction In distributed information systems on the Internet, one often encounters sources that provide incomplete and possibly overlapping information on a set of entities. A fusion query searches over these entities, looking for ones t...

TRIPLE: A Logic for Reasoning with Parameterized Views over Semi-Structured Data.

by Stefan Decker, Michael Sintek, Wolfgang Nejdl, Dfki Gmbh Kaiserslautern , 2002
"... In this paper we investigate and formalize parameterized views for semi-structured data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with views. A unique feature of the language is that views can be used as parameters for other views. The definitions are used to ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this paper we investigate and formalize parameterized views for semi-structured data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with views. A unique feature of the language is that views can be used as parameters for other views. The definitions are used to derive a query and reasoning language based on deductive database technology. Applications include querying of data with respect to multiple semantics and information integration.

IDB: Toward the Scalable Integration of Queryable Internet Data Sources

by Jaewoo Kang, Mong Li Lee, Lee Jeffrey, Jeffrey F. Naughton , 2000
"... As the number of databases accessible on the Web grows, the ability to execute queries spanning multiple heterogeneous queryable sources is becoming increasingly important. To date, research in this area has focused on providing semantic completeness, and has generated solutions that work well wh ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
As the number of databases accessible on the Web grows, the ability to execute queries spanning multiple heterogeneous queryable sources is becoming increasingly important. To date, research in this area has focused on providing semantic completeness, and has generated solutions that work well when querying over a relatively small number of databases that have static and well-defined schemas. Unfortunately, these solutions do not extend to the scale of the present Internet, let alone the Internet of the future. In this paper, we present an approach that makes the opposite tradeoff: it provides a scalable, unified view over large numbers of queryable information sources by sacrificing some expressive power in the set of queries supported. We have developed a prototype system, IDB, which implements this approach. The IDB system provides scalability through three main techniques. First, it uses a collection of ontologies organized into hierarchical namespaces as a medium for ex...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University