Results 1 - 10
of
64
On supporting containment queries in relational database management systems
, 2001
"... Virtually all proposals for querying XML include a class of query we term “containment queries”. It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The ..."
Abstract
-
Cited by 187 (4 self)
- Add to MetaCart
Virtually all proposals for querying XML include a class of query we term “containment queries”. It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The inverted list technology that underlies much of Information Retrieval is well-suited to these queries, but should we implement this technology (a) in a separate loosely-coupled IR engine, or (b) using the native tables and query execution machinery of the RDBMS? With option (b), more than twenty years of work on RDBMS query optimization, query execution, scalability, and concurrency control and recovery immediately extend to the queries and structures that implement these new operations. But all this will be irrelevant if the performance of option (b) lags that of (a) by too much. In this paper, we explore some performance implications of both options using native implementations in two commercial relational database systems and in a special purpose inverted list engine. Our performance study shows that while RDBMSs are generally poorly suited for such queries, under certain conditions they can outperform an inverted list engine. Our analysis further identifies two significant causes that differentiate the performance of the IR and RDBMS implementations: the join algorithms employed and the hardware cache utilization. Our results suggest that contrary to most expectations, with some modifications, a native implementation in an RDBMS can support this class of query much more efficiently.
Query Optimization for XML
- In Proceedings of VLDB
, 1999
"... XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structu ..."
Abstract
-
Cited by 173 (2 self)
- Add to MetaCart
XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structure may change rapidly or unpredictably. This paper describes the query processor of Lore,aDBMS for XML-based data supporting an expressive query language. We focus primarily on Lore's cost-based query optimizer. While all of the usual problems associated with cost-based query optimization apply to XML-based query languages, a number of additional problems arise, such as new kinds of indexing, more complicated notions of database statistics, and vastly different query execution strategies for different databases. We define appropriate logical and physical query plans, database statistics, and a cost model, and we describe plan enumeration including heuristics for reducing the large search space. Our optimizer is fully implemented in Lore and preliminary performance results are reported.
XIRQL: A Query Language for Information Retrieval in XML Documents
, 2001
"... Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these featur ..."
Abstract
-
Cited by 140 (6 self)
- Add to MetaCart
Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented, that also serves as a starting point for query optimization.
XML-GL: a Graphical Language for Querying and Restructuring XML Documents
, 1999
"... The widespreading of XML as a standard for semi-structured documents on the WEB opens up challenging opportunities for WEB query languages. In this paper we introduce XML-GL, a graphical query language for XML documents. The use of a visual formalism for representing both the content of XML document ..."
Abstract
-
Cited by 104 (11 self)
- Add to MetaCart
The widespreading of XML as a standard for semi-structured documents on the WEB opens up challenging opportunities for WEB query languages. In this paper we introduce XML-GL, a graphical query language for XML documents. The use of a visual formalism for representing both the content of XML documents (and of their DTDs) and the syntax and semantics of queries enables an intuitive expression of queries, even when they are rather complex. XML-GL is inspired by G-log, a general purpose, logic-based language for querying structured and semistructured data. The paper presents the basic capabilities of XML-GL through a sequence of examples of increasing complexity and discusses advanced query primitives like order-sensitive conditions, arithmetics and aggregate functions, and complex document construction. 1 Introduction and Motivations XML [Con98] is a recent recommendation of the World Wide Web Consortium for a meta-language to define markups for content publishing on the Web. Th...
Comparative Analysis of Five XML Query Languages
- SIGMOD Record
, 2000
"... XML is becoming the most relevant new standard for data representation and exchange on the WWW. Novel languages for extracting and restructuring the XML contenthave been proposed, some in the tradition of database query languages #i.e. SQL, OQL#, others more closely inspired by XML. No standard f ..."
Abstract
-
Cited by 73 (2 self)
- Add to MetaCart
XML is becoming the most relevant new standard for data representation and exchange on the WWW. Novel languages for extracting and restructuring the XML contenthave been proposed, some in the tradition of database query languages #i.e. SQL, OQL#, others more closely inspired by XML. No standard for XML query language has yet been decided, but the discussion is ongoing within the World Wide Web Consortium and within many academic institutions and Internet-related major companies. We present a comparison of #ve, representative query languages for XML, highlighting their common features and di#erences. 1 Introduction to the #ve languages 1.1 LOREL LOREL was originally designed for querying semistructured data and has now been extended to XML data; it was conceived and implemented at Stanford University #S. Abiteboul, D. Quass, J. McHugh, J. Widom, J. Wiener# and its prototype is at http:##www-db.stanford.edu#lore. It is a user-friendly language in the SQLnOQL style, it includes a...
Change-Centric Management of Versions in an XML Warehouse
- In Proceedings of VLDB 2001
, 2001
"... We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a n ..."
Abstract
-
Cited by 60 (6 self)
- Add to MetaCart
We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a novel representation of changes based on completed deltas and persistent identifiers. We present the foundations of the logical representation and some aspects of the physical storage policy. The work presented here was developed in the context of the Xyleme project of massive XML warehouse for XML data from the Web. It has been implemented and tested. We briefly discuss the implementation.
K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
, 2000
"... The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with t ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.
The Valid Web: an XML/XSL Infrastructure for Temporal Management of Web Documents
, 2000
"... In this paper we present a temporal extension of the World Wide Web based on a complete XML/XSL infrastructure to support valid time. The proposed technique enables the explicit definition of temporal information within HTML/XML documents, whose contents can then be selectively accessed according to ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
In this paper we present a temporal extension of the World Wide Web based on a complete XML/XSL infrastructure to support valid time. The proposed technique enables the explicit definition of temporal information within HTML/XML documents, whose contents can then be selectively accessed according to their valid time. By acting on a navigation validity context, the proposed solution makes it possible to "travel in time" in a given virtual environment with any XML-compliant browser; this allows, for instance, to cut personalized visit routes for a specific epoch in a virtual museum or a digital historical library, to visualize the evolution of an archaeological site through successives ages, to selectively access past issues of magazines, to browse historical time series (e.g. stock quote archives), etc. The proposed Web extensions have been tested on a demo prototype showing, as application example, the functionalities of a temporal Web museum.
An Architecture for Secure Wide-Area Service Discovery
, 2002
"... This paper presents the architecture and implementation of a secure wide-area Service Discovery Service (SDS). Service providers use the SDS to advertise descriptions of available or already running services, while clients use the SDS to compose complex queries for locating these services. Service ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
This paper presents the architecture and implementation of a secure wide-area Service Discovery Service (SDS). Service providers use the SDS to advertise descriptions of available or already running services, while clients use the SDS to compose complex queries for locating these services. Service descriptions and queries use the eXtensible Markup Language (XML) to encode such factors as cost, performance, location, and device- or service-specific capabilities. The SDS provides a fault-tolerant, incrementally scalable service for locating services in the wide-area. Security is a core component of the SDS: communications are both encrypted and authenticated where necessary, and the system uses a hybrid access control list and capability system to control access to service information. Wide-area query routing is also a core component of the SDS: all information in the system is potentially reachable by all clients

