Results 1 - 10
of
55
Querying Heterogeneous Information Sources Using Source Descriptions
, 1996
"... We witness a rapid increase in the number of structured information sources that are available online, especially on the WWW. These sources include commercial databases on product information, stock market information, real estate, automobiles, and entertainment. We would like to use the data stored ..."
Abstract
-
Cited by 638 (33 self)
- Add to MetaCart
We witness a rapid increase in the number of structured information sources that are available online, especially on the WWW. These sources include commercial databases on product information, stock market information, real estate, automobiles, and entertainment. We would like to use the data stored in these databases to answer complex queries that go beyond keyword searches. We face the following challenges: (1) Several information sources store interrelated data, and any query-answering system must understand the relationships between their contents. (2) Many sources are not full-featured database systems and can answer only a small set of queries over their data (for example, forms on the WWW restrict the set of queries one can ask). (3) Since the number of sources is very large, effective techniques are needed to prune the set of information sources accessed to answer a query. (4) The details of interacting with each source vary greatly. We describe the Information Manifold, an imp...
Information integration using logical views
, 1997
"... Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conju ..."
Abstract
-
Cited by 395 (4 self)
- Add to MetaCart
Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conjunctive queries and/or Datalog programs. Then we com-pare the approaches taken by AT&T Labs ' "Information Manifold " and the Stanford "Tsimmis " project in these terms. 1 Theoretical Background Before addressing information-integration issues, let us review some of the basic ideas concerning conjunctive queries, Datalog programs, and their containment. To begin, we use the logical rule notation from [Ull88]. Example 1. The following: p(X,Z):- a(X,Y) & a(Y,Z). is a rule that talks about a, an EDB predicate ("Extensional DataBase, " or stored relation), and p, an IDB predicate ("Intensional DataBase, " or predicate whose relation is constructed by rules). In this and several other examples, it is useful to think of a as an "arc " predicate defining a graph, while other predicates define certain structures that might exist in the graph. That is, a(X, Y) means there is an arc from node X to node Y. In this case, the rule says "p(X, Z) is true if there is an arc from node X to node Y and also an arc from Y to Z." That is, p represents paths of length 2. In general, there is one atom, the head, on the left of the "if " sign,:- and zero of more atoms, called subgoals, on the right side (the body). The head always has an IDB predicate; the subgoals can have IDB or EDB predicates. Thus, here p(X, Z) is the head, while a(X, Y) and a(Y, Z) are subgoals. We assume that each variable appearing in the head also appears somewhere in the body. This "safety " requirement assures that when we use a rule, we are not left with undefined variables in the head when we try to infer a fact about the head's predicate. We also assume that atoms consist of a predicate and zero or more arguments. An argument can be either a variable or a constant. However, we exclude function symbols from arguments.
Answering Queries Using Views: A Survey
, 2000
"... The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a w ..."
Abstract
-
Cited by 395 (27 self)
- Add to MetaCart
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.
Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity
, 1998
"... Most databases contain "name constants" like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. ..."
Abstract
-
Cited by 193 (13 self)
- Add to MetaCart
Most databases contain "name constants" like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. However, in many cases, this assumption does not hold; determining if two name constants should be considered identical can require detailed knowledge of the world, the purpose of the user's query, or both. In this paper, we reject the assumption that global domains can be easily constructed, and assume instead that the names are given in natural language text. We then propose a logic called WHIRL which reasons explicitly about the similarity of local names, as measured using the vector-space model commonly adopted in statistical information retrieval. We describe an efficient implementation of WHIRL and evaluate it experimentally on data extracted from the World Wide Web. We show that WHIR...
AL-log: Integrating Datalog and Description Logics
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 1998
"... We presenan integrated system for knowledge representation, called AL-log, based on description logics and the deductive database language Datalog. AL-log embodies two subsystems, called structural and relational. The former allows for the definition of structural knowledge about classes of interest ..."
Abstract
-
Cited by 133 (12 self)
- Add to MetaCart
We presenan integrated system for knowledge representation, called AL-log, based on description logics and the deductive database language Datalog. AL-log embodies two subsystems, called structural and relational. The former allows for the definition of structural knowledge about classes of interest #concepts# and membership relation between objects and classes. The latter allows for the definition of relational knowledge about objects described in the structural component. The interaction between the two components is obtained by allowing constraints within Datalog clauses, thus requiring the variables in the clauses to range over the set of instances of a specified concept. We propose a method for query answering in AL-log based on constrained resolution, where the usual deduction procedure defined for Datalog is integrated with a method for reasoning on the structural knowledge.
Navigational Plans For Data Integration
- In Proceedings of the National Conference on Artificial Intelligence (AAAI
, 1999
"... We consider the problem of building data integration systems when the data sources are webs of data, rather than sets of relations. Previous approaches to modeling data sources are inappropriate in this context because they do not capture the relationships between linked data and the need to navigat ..."
Abstract
-
Cited by 112 (3 self)
- Add to MetaCart
We consider the problem of building data integration systems when the data sources are webs of data, rather than sets of relations. Previous approaches to modeling data sources are inappropriate in this context because they do not capture the relationships between linked data and the need to navigate through paths in the data source in order to obtain the data. We describe a language for modeling data sources in this new context. We show that our language has the required expressive power, and that minor extensions to it would make query answering intractable. We provide a sound and complete algorithm for reformulating a user query into a query over the data sources, and we show how to create query execution plans that both query and navigate the data sources. Introduction The purpose of data integration is to provide a uniform interface to a multitude of data sources. Data integration applications arise frequently as corporations attempt to provide their customers and employees wit...
CARIN: A Representation Language Combining Horn Rules and Description Logics
, 1996
"... . We describe CARIN, a novel family of representation languages, which integrate the expressive power of Horn rules and of description logics. We address the key issue in designing such a language, namely, providing a sound and complete inference procedure. We identify existential entailment as a c ..."
Abstract
-
Cited by 94 (1 self)
- Add to MetaCart
. We describe CARIN, a novel family of representation languages, which integrate the expressive power of Horn rules and of description logics. We address the key issue in designing such a language, namely, providing a sound and complete inference procedure. We identify existential entailment as a core problem in reasoning in CARIN, and describe an existential entailment algorithm for CARIN languages whose description logic component is ALCNR. This algorithm entails several important results for reasoning in CARIN, most notably: (1) a sound and complete inference procedure for non recursive CARIN-ALCNR, and (2) an algorithm for determining rule subsumption over ALCNR. 1 Introduction Horn rule languages have formed the basis for many Artificial Intelligence application languages because their expressive power is sufficient for many applications, and they have good computational properties. One of the significant limitations of Horn rules is that they are not expressive enough to mod...
P-CLASSIC: A tractable probabilistic description logic
- In Proceedings of AAAI-97
, 1997
"... Knowledge representation languages invariably reflect a trade-off between expressivity and tractability. Evidence suggests that the compromise chosen by description logics is a particularly successful one. However, description logic (as for all variants of first-order logic) is severely limited in i ..."
Abstract
-
Cited by 91 (4 self)
- Add to MetaCart
Knowledge representation languages invariably reflect a trade-off between expressivity and tractability. Evidence suggests that the compromise chosen by description logics is a particularly successful one. However, description logic (as for all variants of first-order logic) is severely limited in its ability to express uncertainty. In this paper, we present P-CLASSIC, a probabilistic version of the description logic CLASSIC. In addition to terminological knowledge, the language utilizes Bayesian networks to express uncertainty about the basic properties of an individual, the number of fillers for its roles, and the properties of these fillers. We provide a semantics for P-CLASSIC and an effective inference procedure for probabilistic subsumption: computing the probability that a random individual in class C is also in class D. The effectiveness of the algorithm relies on independenceassumptions and on our ability to execute lifted inference: reasoning about similar individuals as a gr...
Distributed Description Logics: Assimilating Information from Peer Sources
, 2003
"... Due to the availability on the Internet of a wide variety of sources of information on closely related topics, the problem of providing seamless, integrated access to such sources has become (again) a major research challenge. Although this problem has been studied for several decades, especiall ..."
Abstract
-
Cited by 91 (13 self)
- Add to MetaCart
Due to the availability on the Internet of a wide variety of sources of information on closely related topics, the problem of providing seamless, integrated access to such sources has become (again) a major research challenge. Although this problem has been studied for several decades, especially in the database community, there is a need for a more refined approach in those cases where the original sources maintain their own independent view of the world. In particular, we motivate with examples the utility of directed non-injective mappings between the individuals in the domains of multiple Information Sources.
Logic-Based Techniques In Data Integration
, 1999
"... The data integration problem is to provide uniform access to multiple heterogeneous information sources available online (e.g., databases on the WWW). This problem has recently received considerable attention from researchers in the fields of Artificial Intelligence and Database Systems. The data in ..."
Abstract
-
Cited by 87 (0 self)
- Add to MetaCart
The data integration problem is to provide uniform access to multiple heterogeneous information sources available online (e.g., databases on the WWW). This problem has recently received considerable attention from researchers in the fields of Artificial Intelligence and Database Systems. The data integration problem is complicated by the facts that (1) sources contain closely related and overlapping data, (2) data is stored in multiple data models and schemas, and (3) data sources have differing query processing capabilities. A key element in a data integration system is the language used to describe the contents and capabilities of the data sources. While such a language needs to be as expressive as possible, it should also enable to efficiently address the main inference problem that arises in this context: to translate a user query that is formulated over a mediated schema into a query on the local schemas. This paper describes several lanaguages for describing contents of data sources, ...

