Results 1 - 10
of
88
The Lorel Query Language for Semistructured Data
- International Journal on Digital Libraries
, 1997
"... We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inapprop ..."
Abstract
-
Cited by 631 (25 self)
- Add to MetaCart
We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular, some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of ODMG and the language as an extension of OQL. The main novelties of the Lorel language are: (i) extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data
The TSIMMIS Approach to Mediation: Data Models and Languages
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 1997
"... TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically th ..."
Abstract
-
Cited by 344 (8 self)
- Add to MetaCart
TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically the components that are needed to build systems for integrating information. In this paper we shall discuss the principal architectural features and their rationale.
Using Schema Matching to Simplify Heterogeneous Data Translation
, 1998
"... A broad spectrum of data is available on the Web in distinct heterogeneous sources, and stored under different formats. As the number of systems that utilize this heterogeneous data grows, the importance of data translation and conversion mechanisms increases greatly. In this paper we present a n ..."
Abstract
-
Cited by 187 (5 self)
- Add to MetaCart
A broad spectrum of data is available on the Web in distinct heterogeneous sources, and stored under different formats. As the number of systems that utilize this heterogeneous data grows, the importance of data translation and conversion mechanisms increases greatly. In this paper we present a new translation system, based on schemamatching, aimed to simplify the intricate task of data conversion. We observe that in many cases the schema of the data in the source system is very similar to the that of the target system. In such cases, much of the translation work can be done automatically, based on the schemas similarity. This saves a lot of effort for the user, limiting the amount of programming needed. We define common schema and data models, in which schemas and data (resp.) from many common models can be represented. Using a rulebased method, the source schema is compared with the target one, and each component in the source schema is matched with a corresponding compone...
RQL: A Declarative Query Language for RDF
"... Real-scale Semantic Web applications, such as Web Portals and E-Marketplaces, require the management of voluminous metadata repositories containing descriptive information (i.e., metadata) about the available Web resources and services. Better knowledge about the meaning, usage, accessibility or qua ..."
Abstract
-
Cited by 174 (19 self)
- Add to MetaCart
Real-scale Semantic Web applications, such as Web Portals and E-Marketplaces, require the management of voluminous metadata repositories containing descriptive information (i.e., metadata) about the available Web resources and services. Better knowledge about the meaning, usage, accessibility or quality of these resources and services will considerably facilitate the automated processing of both Web content and services. In this context, the Resource Description Framework (RDF) enables the creation and exchange of metadata as any other Web data. Although large volumes of RDF descriptions are already appearing (e.g., as exported Portal catalogs or service descriptions), sufficiently expressive declarative languages for querying both RDF descriptions and schemas are still missing. In this paper, we propose RQL, a new RDF query language, relying on a formal graph model that permits the interpretation of superimposed resource descriptions. RQL is an OQL-inspired adaptation of XML query languages to the peculiarities of RDF but, foremost, is an extension of this functionality for uniformly querying both descriptions and schemas. We illustrate the syntax, semantics and core functionality of RQL bymeans of a set of benchmark queries and report on the performance of RSSDB, our persistent RDF Store, for storing and querying voluminous RDF descriptions.
Object fusion in mediator systems
- INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES
, 1996
"... One of the main tasks of mediators is to fuse information from heterogeneous information sources. This may involve, for example, removing redundancies, and resolving inconsistencies in favor of the most reliable source. The problem becomes harder when the sources are unstructured/semistructured and ..."
Abstract
-
Cited by 155 (29 self)
- Add to MetaCart
One of the main tasks of mediators is to fuse information from heterogeneous information sources. This may involve, for example, removing redundancies, and resolving inconsistencies in favor of the most reliable source. The problem becomes harder when the sources are unstructured/semistructured and we do not have complete knowledge of their contents and structure. In this paper we show how many common fusion operations can be specified non-procedurally and succinctly. The key to our approach is to assign semantically meaningful object ids to objects as they are "imported " into the mediator.
Optimizing Regular Path Expressions Using Graph Schemas
, 1998
"... Several languages, such as LOREL and UnQL, support querying of semi-structured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of pa ..."
Abstract
-
Cited by 136 (5 self)
- Add to MetaCart
Several languages, such as LOREL and UnQL, support querying of semi-structured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of path expressions is inefficient, however, because it often requires exhaustive graph search. We describe two optimization techniques for queries with regular path expressions, which we call regular queries. Both rely on graph schemas, which specify partial knowledge of a graph's structure. Query pruning restricts search to a fragment of the graph; we give an efficient algorithm for rewriting any regular query into a pruned one. Query rewriting using state extents can entirely eliminate or substantially reduce graph traversal; it is reminiscent of optimizing relational queries using indices. There may be several ways to optimize a query using state extents; we give an exponential-time algorith...
A Query Translation Scheme for Rapid Implementation of Wrappers
, 1995
"... Wrappers provide access to heterogeneous information sources by converting application queries into source specific queries or commands. In this paper we present a wrapper implementation toolkit that facilitates rapid development of wrappers. We focus on the query translation component of the toolki ..."
Abstract
-
Cited by 123 (22 self)
- Add to MetaCart
Wrappers provide access to heterogeneous information sources by converting application queries into source specific queries or commands. In this paper we present a wrapper implementation toolkit that facilitates rapid development of wrappers. We focus on the query translation component of the toolkit, called the converter. The converter takes as input a Query Description and Translation Language (QDTL) description of the queries that can be processed by the underlying source. Based on this description the converter decides if an application query is (a) directly supported, i.e., it can be translated to a query of the underlying system following instructions in the QDTL description; (b) logically supported, i.e., logically equivalent to a directly supported query; (c) indirectly supported, i.e., it can be computed by applying a filter, automatically generated by the converter, to the result of a directly supported query. 1 Introduction A wrapper or translator [C + 94, PGMW95] is a s...
Scaling Heterogeneous Databases and the Design of Disco
, 1995
"... Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating each new data source into the system. Database impleme ..."
Abstract
-
Cited by 116 (13 self)
- Add to MetaCart
Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating each new data source into the system. Database implementors must deal with the transformation of queries between query languages and schemas. The Distributed Information Search COmponent (DISCO) addresses these problems. Query processing semantics give meaning to queries that reference unavailable data sources. Data modeling techniques manage connections to data sources. The component interface to data sources flexibly handles different query languages and different interface functionalities. This paper describes in detail (a) the distributed mediator architecture of DISCO, (b) its query processing semantics, (c) the data model and its modeling of data source connections, and (d) the interface to underlying data sources. We describe several advantages of our system and describe the internal architecture of our planned prototype.
Querying Semistructured Heterogeneous Information
, 1995
"... . Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query l ..."
Abstract
-
Cited by 104 (10 self)
- Add to MetaCart
. Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a "lightweight" object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series o...
Querying Documents in Object Databases
, 1997
"... We consider the problem of storing and accessing documents (SGML and HTML, in particular) using database technology. To specify the database image of documents, we use structuring schemas that consist in grammars annotated with database programs. To query documents, we introduce an extension of OQL ..."
Abstract
-
Cited by 82 (13 self)
- Add to MetaCart
We consider the problem of storing and accessing documents (SGML and HTML, in particular) using database technology. To specify the database image of documents, we use structuring schemas that consist in grammars annotated with database programs. To query documents, we introduce an extension of OQL, the ODMG standard query language for object databases. Our extension (named OQL-doc) allows to query documents without a precise knowledge of their structure using in particular generalized path expressions and pattern matching. This allows us to introduce in a declarative language (in the style of SQL or OQL), navigational and information retrieval styles of accessing data. Query processing in the context of documents and path expressions leads to challenging implementation issues. We extend an object algebra with new operators to deal with generalized path expressions. We then consider two essential complementary optimization techniques: 1. we show that almost standard database optim...

