Results 1 -
8 of
8
Information integration using logical views
, 1997
"... Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conju ..."
Abstract
-
Cited by 395 (4 self)
- Add to MetaCart
Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conjunctive queries and/or Datalog programs. Then we com-pare the approaches taken by AT&T Labs ' "Information Manifold " and the Stanford "Tsimmis " project in these terms. 1 Theoretical Background Before addressing information-integration issues, let us review some of the basic ideas concerning conjunctive queries, Datalog programs, and their containment. To begin, we use the logical rule notation from [Ull88]. Example 1. The following: p(X,Z):- a(X,Y) & a(Y,Z). is a rule that talks about a, an EDB predicate ("Extensional DataBase, " or stored relation), and p, an IDB predicate ("Intensional DataBase, " or predicate whose relation is constructed by rules). In this and several other examples, it is useful to think of a as an "arc " predicate defining a graph, while other predicates define certain structures that might exist in the graph. That is, a(X, Y) means there is an arc from node X to node Y. In this case, the rule says "p(X, Z) is true if there is an arc from node X to node Y and also an arc from Y to Z." That is, p represents paths of length 2. In general, there is one atom, the head, on the left of the "if " sign,:- and zero of more atoms, called subgoals, on the right side (the body). The head always has an IDB predicate; the subgoals can have IDB or EDB predicates. Thus, here p(X, Z) is the head, while a(X, Y) and a(Y, Z) are subgoals. We assume that each variable appearing in the head also appears somewhere in the body. This "safety " requirement assures that when we use a rule, we are not left with undefined variables in the head when we try to infer a fact about the head's predicate. We also assume that atoms consist of a predicate and zero or more arguments. An argument can be either a variable or a constant. However, we exclude function symbols from arguments.
Query Rewriting for Semistructured Data
"... We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q ..."
Abstract
-
Cited by 60 (9 self)
- Add to MetaCart
We address the problem of query rewriting for TSL, a language for querying semistructured data. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and query composition- techniques that were developed for structured, relational data. We also develop an algorithm for equivalence checking of TSL queries. We show that the algorithm is sound and complete for TSL, i.e., it always finds every non-trivial TSL rewriting query of q, and we discuss its complexity. We extend the rewriting algorithm to use some forms of structural constraints (such as DTDs) and find more opportunities for query rewriting.
Fusion Queries over Internet Databases
"... . Fusion queries search for information integrated from distributed, autonomous sources over the Internet. We investigate techniques for efficient processing of fusion queries. First, we focus on a very wide class of query plans that capture the spirit of many techniques usually considered in existi ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
. Fusion queries search for information integrated from distributed, autonomous sources over the Internet. We investigate techniques for efficient processing of fusion queries. First, we focus on a very wide class of query plans that capture the spirit of many techniques usually considered in existing systems. We show how to efficiently find good query plans within this large class. We provide additional heuristics that, by considering plans outside our target class of plans, yield further performance improvements. 1 Introduction In distributed information systems on the Internet, data sources often provide incomplete and overlapping information on a set of entities. A fusion query searches over these entities, looking for ones that satisfy given conditions. To illustrate, consider databases operated by the Departments of Motor Vehicles (DMVs) of several states. Conceptually, each state database can be thought of as a relation R i with the following attributes, among others: Driver's...
Query Rewriting using Semistructured Views
- In Proc. of the ACM SIGMOD Int. Conf. on Management of Data
, 1998
"... We address the problem of query rewriting for MSL, a semistructured language developed at Stanford in the TSIMMIS project for information integration. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We address the problem of query rewriting for MSL, a semistructured language developed at Stanford in the TSIMMIS project for information integration. We develop and present an algorithm that, given a semistructured query q and a set of semistructured views V, finds rewriting queries, i.e., queries that access the views and produce the same result as q. Our algorithm is based on appropriately generalizing containment mappings, the chase, and unification -- techniques which were developed for structured, relational data. At the same time we develop an algorithm for equivalence checking of MSL queries. We show that the rewriting algorithm is sound and complete, i.e., it always finds every conjunctive MSL rewriting query of q, and we discuss its complexity. We currently incorporate the algorithm in the TSIMMIS system. 1 Introduction Recently, many semistructured data models, query and view definition languages have been proposed [GM + 97, MAG + 97, BDHS96, AV97a, MM97, KS95, PGMU96,...
The Model-Theoretic Semantics of TRIPLE
, 2002
"... In this paper we investigate and formalize the notion of RDF Models (or context) for RDF data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with RDF and contexts. A unique feature of the language is that one context can be used as parameters for other contexts. T ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper we investigate and formalize the notion of RDF Models (or context) for RDF data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with RDF and contexts. A unique feature of the language is that one context can be used as parameters for other contexts. The definitions are used to derive a query and reasoning language based on deductive database technology. Applications include querying of data with respect to multiple semantics and information integration.
Fusion Queries over Internet Databases
- In Proc. of the Conf. on Extending Database Technology (EDBT
, 1997
"... Fusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate techniques for efficie ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Fusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate techniques for efficient processing of fusion queries. First, we focus on a very wide class of query plans that capture the spirit of many techniques usually considered in existing systems. We show how to efficiently find, under various realistic scenarios, good query plans within this large class. We evaluate the performance of these plans and provide additional heuristics that, by considering plans outside our target class of plans, yield further performance improvements. 1 Introduction In distributed information systems on the Internet, one often encounters sources that provide incomplete and possibly overlapping information on a set of entities. A fusion query searches over these entities, looking for ones t...
TRIPLE: A Logic for Reasoning with Parameterized Views over Semi-Structured Data.
, 2002
"... In this paper we investigate and formalize parameterized views for semi-structured data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with views. A unique feature of the language is that views can be used as parameters for other views. The definitions are used to ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we investigate and formalize parameterized views for semi-structured data by defining syntax and a model theoretic semantics of TRIPLE, a logic for dealing with views. A unique feature of the language is that views can be used as parameters for other views. The definitions are used to derive a query and reasoning language based on deductive database technology. Applications include querying of data with respect to multiple semantics and information integration.
IDB: Toward the Scalable Integration of Queryable Internet Data Sources
, 2000
"... As the number of databases accessible on the Web grows, the ability to execute queries spanning multiple heterogeneous queryable sources is becoming increasingly important. To date, research in this area has focused on providing semantic completeness, and has generated solutions that work well wh ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As the number of databases accessible on the Web grows, the ability to execute queries spanning multiple heterogeneous queryable sources is becoming increasingly important. To date, research in this area has focused on providing semantic completeness, and has generated solutions that work well when querying over a relatively small number of databases that have static and well-defined schemas. Unfortunately, these solutions do not extend to the scale of the present Internet, let alone the Internet of the future. In this paper, we present an approach that makes the opposite tradeoff: it provides a scalable, unified view over large numbers of queryable information sources by sacrificing some expressive power in the set of queries supported. We have developed a prototype system, IDB, which implements this approach. The IDB system provides scalability through three main techniques. First, it uses a collection of ontologies organized into hierarchical namespaces as a medium for ex...

