Results 1 - 10
of
347
The Lorel Query Language for Semistructured Data
- International Journal on Digital Libraries
, 1997
"... We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inapprop ..."
Abstract
-
Cited by 631 (25 self)
- Add to MetaCart
We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular, some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of ODMG and the language as an extension of OQL. The main novelties of the Lorel language are: (i) extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data
Querying Semi-Structured Data
, 1997
"... The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in file systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including W ..."
Abstract
-
Cited by 467 (19 self)
- Add to MetaCart
The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in file systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specific interfaces, or data exchange formats. Some of this data is raw data, e.g. images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases...
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
, 1997
"... In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are ..."
Abstract
-
Cited by 459 (14 self)
- Add to MetaCart
In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.
The TSIMMIS Project: Integration of Heterogeneous Information Sources
"... The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, ..."
Abstract
-
Cited by 451 (16 self)
- Add to MetaCart
The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, that translate information into a common object model, that combine information from several sources, that allow browsing of information, and that manage constraints across heterogeneous sites. Tsimmis is a joint project between Stanford and the IBM Almaden Research Center.
Information integration using logical views
, 1997
"... Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conju ..."
Abstract
-
Cited by 395 (4 self)
- Add to MetaCart
Abstract. A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algo-rithms for conjunctive queries and/or Datalog programs. Then we com-pare the approaches taken by AT&T Labs ' "Information Manifold " and the Stanford "Tsimmis " project in these terms. 1 Theoretical Background Before addressing information-integration issues, let us review some of the basic ideas concerning conjunctive queries, Datalog programs, and their containment. To begin, we use the logical rule notation from [Ull88]. Example 1. The following: p(X,Z):- a(X,Y) & a(Y,Z). is a rule that talks about a, an EDB predicate ("Extensional DataBase, " or stored relation), and p, an IDB predicate ("Intensional DataBase, " or predicate whose relation is constructed by rules). In this and several other examples, it is useful to think of a as an "arc " predicate defining a graph, while other predicates define certain structures that might exist in the graph. That is, a(X, Y) means there is an arc from node X to node Y. In this case, the rule says "p(X, Z) is true if there is an arc from node X to node Y and also an arc from Y to Z." That is, p represents paths of length 2. In general, there is one atom, the head, on the left of the "if " sign,:- and zero of more atoms, called subgoals, on the right side (the body). The head always has an IDB predicate; the subgoals can have IDB or EDB predicates. Thus, here p(X, Z) is the head, while a(X, Y) and a(Y, Z) are subgoals. We assume that each variable appearing in the head also appears somewhere in the body. This "safety " requirement assures that when we use a rule, we are not left with undefined variables in the head when we try to infer a fact about the head's predicate. We also assume that atoms consist of a predicate and zero or more arguments. An argument can be either a variable or a constant. However, we exclude function symbols from arguments.
A Query Language and Optimization Techniques for Unstructured Data
, 1996
"... A new kind of data model has recently emerged in which the database is not constrained by a conventional schema. Systems like ACeDB, which has become very popular with biologists, and the recent Tsimmis proposal for data integration organize data in tree-like structures whose components can be used ..."
Abstract
-
Cited by 368 (34 self)
- Add to MetaCart
A new kind of data model has recently emerged in which the database is not constrained by a conventional schema. Systems like ACeDB, which has become very popular with biologists, and the recent Tsimmis proposal for data integration organize data in tree-like structures whose components can be used equally well to represent sets and tuples. Such structures allow great flexibility in data representation What query language is appropriate for such structures? Here we propose a simple language UnQL for querying data organized as a rooted, edge-labeled graph. In this model, relational data may be represented as fixed-depth trees, and on such trees UnQL is equivalent to the relational algebra. The novelty of UnQL consists in its programming constructs for arbitrarily deep data and for cyclic structures. While strictly more powerful than query languages with path expressions like XSQL, UnQL can still be efficiently evaluated. We describe new optimization techniques for the deep or "vertical" dimension of UnQL queries. Furthermore, we show that known optimization techniques for operators on flat relations apply to the "horizontal" dimension of UnQL.
The TSIMMIS Approach to Mediation: Data Models and Languages
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 1997
"... TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically th ..."
Abstract
-
Cited by 344 (8 self)
- Add to MetaCart
TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically the components that are needed to build systems for integrating information. In this paper we shall discuss the principal architectural features and their rationale.
Similarity flooding: A versatile graph matching algorithm
, 2002
"... Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (sch ..."
Abstract
-
Cited by 322 (9 self)
- Add to MetaCart
Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the ‘accuracy ’ of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings. 1.
Lore: A database management system for semistructured data
- SIGMOD Record
, 1997
"... Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an ov ..."
Abstract
-
Cited by 297 (21 self)
- Add to MetaCart
Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.
Why and Where: A Characterization of Data Provenance
- In ICDT
, 2001
"... With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to ..."
Abstract
-
Cited by 254 (18 self)
- Add to MetaCart
With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query.We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between #why" provenance #refers to the source data that had some in#uence on the existence of the data# and #where" provenance #refers to the location#s# in the source databases from which the data was extracted#.

