Results 1 - 10
of
530
The Lorel Query Language for Semistructured Data
- International Journal on Digital Libraries
, 1997
"... We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inapprop ..."
Abstract
-
Cited by 731 (29 self)
- Add to MetaCart
We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular, some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of ODMG and the language as an extension of OQL. The main novelties of the Lorel language are: (i) extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data
Answering Queries Using Views: A Survey
, 2000
"... The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a w ..."
Abstract
-
Cited by 562 (32 self)
- Add to MetaCart
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.
A Query Language for XML
, 1998
"... An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, ..."
Abstract
-
Cited by 340 (22 self)
- Add to MetaCart
An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, conversion, transformation, and integration are all well-understood database problems, and their solutions rely on a query language. We present a query language for XML, called XML-QL, which we argue is suitable for performing the above tasks. XML-QL is a declarative, "relational complete" query language and is simple enough that it can be optimized. XML-QL can extract data from existing XML documents and construct new XML documents. Keywords: XML, query languages, electronic-data interchange (EDI) 1. Introduction The goal of XML is to provide many of SGML's benefits not available in HTML and to provide them in a language that is easier to learn and use than complete SGML. These benefits...
Lore: A database management system for semistructured data
- SIGMOD Record
, 1997
"... Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an ov ..."
Abstract
-
Cited by 339 (24 self)
- Add to MetaCart
(Show Context)
Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.
Index Structures for Path Expressions
, 1997
"... In recent years there has been an increased interest in managing data which does not conform to traditional data models, like the relational or object oriented model. The reasons for this non-conformance are diverse. One one hand, data may not conform to such models at the physical level: it may be ..."
Abstract
-
Cited by 333 (7 self)
- Add to MetaCart
In recent years there has been an increased interest in managing data which does not conform to traditional data models, like the relational or object oriented model. The reasons for this non-conformance are diverse. One one hand, data may not conform to such models at the physical level: it may be stored in data exchange formats, fetched from the Internet, or stored as structured les. One the other hand, it may not conform at the logical level: data may have missing attributes, some attributes may be of di erent types in di erent data items, there may be heterogeneous collections, or the data may be simply specified by a schema which is too complex or changes too often to be described easily as a traditional schema. The term semistructured data has been used to refer to such data. The data model proposed for this kind of data consists of an edge-labeled graph, in which nodes correspond to objects and edges to attributes or values. Figure 1 illustrates a semistructured database providing information about a city. Relational databases are traditionally queried with associative queries, retrieving tuples based on the value of some attributes. To answer such queries efciently, database management systems support indexes for translating attribute values into tuple ids (e.g. B-trees or hash tables). In object-oriented databases, path queries replace the simpler associative queries. Several data structures have been proposed for answering path queries e ciently: e.g., access support relations 14] and path indexes 4]. In the case of semistructured data, queries are even more complex, because they may contain generalized path expressions 1, 7, 8, 16]. The additional exibility is needed in order to traverse data whose structure is irregular, or partially unknown to the user.
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 320 (3 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
DBXplorer: A system for keyword-based search over relational databases
- In ICDE
, 2002
"... Internet search engines have popularized the keywordbased search paradigm. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keywordbased search in relational databases. DBX ..."
Abstract
-
Cited by 306 (6 self)
- Add to MetaCart
Internet search engines have popularized the keywordbased search paradigm. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keywordbased search in relational databases. DBXplorer has been implemented using a commercial relational database and web server and allows users to interact via a browser front-end. We outline the challenges and discuss the implementation of our system including results of extensive experimental evaluation. 1.
Semistructured data
, 1997
"... In semistructured data, the information that is normally as-sociated with a schema is contained within the data, which is sometimes called “self-describing”. In some forms of semi-structured data there is no separate schema, in others it exists but only places loose constraints on the data. Semi-str ..."
Abstract
-
Cited by 281 (0 self)
- Add to MetaCart
(Show Context)
In semistructured data, the information that is normally as-sociated with a schema is contained within the data, which is sometimes called “self-describing”. In some forms of semi-structured data there is no separate schema, in others it exists but only places loose constraints on the data. Semi-structured data has recently emerged as an important topic of study for a variety of reasons. First, there are data sources such as the Web, which we would like to treat as databases but which cannot be constrained by a schema. Second, it may be desirable to have an extremely flexible format for data exchange between disparate databases. Third, even when dealing with structured data, it may be helpful to view it. as semistructured for the purposes of browsing. This tu-torial will cover a number of issues surrounding such data: finding a concise formulation, building a sufficiently expres-sive language for querying and transformation, and opti-mizat,ion problems. 1 The motivation The topic of semistructured data (also called unstructured data) is relatively recent, and a tutorial on the topic may well be premature. It represents, if anything, the conver-gence of a number of lines of thinking about new ways to represent and query data that do not completely fit with conventional data models. The purpose of this tutorial is to to describe this motivation and to suggest areas in which further research may be fruitful. For a similar exposition, the reader is referred to Serge Abiteboul’s recent survey pa-per PI. The slides for this tutorial will be made available from a section of the Penn database home page
Wrapper Induction: Efficiency and Expressiveness
- Artificial Intelligence
, 2000
"... The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, event listings, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually fo ..."
Abstract
-
Cited by 267 (11 self)
- Add to MetaCart
(Show Context)
The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, event listings, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Most systems use customized wrapper procedures to perform this extraction task. Unfortunately, writing wrappers is tedious and error-prone. As an alternative, we advocate wrapper induction, a technique for automatically constructing wrappers. In this article, we describe six wrapper classes, and use a combination of empirical and analytical techniques to evaluate the computational tradeoffs among them. We first consider expressiveness: how well the classes can handle actual Internet resources, and the extent to which wrappers in one class can mimic those in another. We then...