Results 1 - 10
of
34
Dynamic XML Documents with Distribution and Replication
- In Proc. of ACM SIGMOD
, 2003
"... The advent of XML as a universal exchange format, and of Web services as a basis for distributed computing, has fostered the apparition of a new class of documents: dynamic XML documents. These are XML documents where some data is given explicitly while other parts are given only intensionally by me ..."
Abstract
-
Cited by 61 (9 self)
- Add to MetaCart
The advent of XML as a universal exchange format, and of Web services as a basis for distributed computing, has fostered the apparition of a new class of documents: dynamic XML documents. These are XML documents where some data is given explicitly while other parts are given only intensionally by means of embedded calls to web services that can be called to generate the required information. By the sole presence of Web services, dynamic documents already include inherently some form of distributed computation. A higher level of distribution that also allows (fragments of) dynamic documents to be distributed and/or replicated over several sites is highly desirable in today's Web architecture, and in fact is also relevant for regular (non dynamic) documents.
MARS: A System for Publishing XML from Mixed and Redundant Storage
- In VLDB
, 2003
"... We present a system for publishing as XML data from mixed (relational+XML) proprietary storage, while supporting redundancy in storage for tuning purposes. The correspondence between public and proprietary schemas is given by a combination of LAVand GAV-style views expressed in XQuery. ..."
Abstract
-
Cited by 57 (9 self)
- Add to MetaCart
We present a system for publishing as XML data from mixed (relational+XML) proprietary storage, while supporting redundancy in storage for tuning purposes. The correspondence between public and proprietary schemas is given by a combination of LAVand GAV-style views expressed in XQuery.
Crossing the Structure Chasm
- IN CIDR
, 2003
"... It has frequently been observed that most of the world's data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others. The world of unstructured data has several very appealing properties, such as ease of authoring, query ..."
Abstract
-
Cited by 42 (15 self)
- Add to MetaCart
It has frequently been observed that most of the world's data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others. The world of unstructured data has several very appealing properties, such as ease of authoring, querying and data sharing. In contrast, authoring, querying and sharing structured data require significant effort, albeit with the benefit of rich query languages and exact answers. We argue
Bloom histogram: Path selectivity estimation for xml data with updates
- In VLDB
, 2004
"... Cost-based XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any g ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Cost-based XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any guarantee on the estimation accuracy within a given space limit. In addition, most of them assume that the XML data are more or less static, i.e., with few updates. In this paper, we present a framework for XML path selectivity estimation in a dynamic context. Specifically, we propose a novel data structure, bloom histogram, to approximate XML path frequency distribution within a small space budget and to estimate the path selectivity accurately with the bloom histogram. We obtain the upper bound of its estimation error and discuss the trade-offs between the accuracy and the space limit. To support updates of bloom histograms efficiently when underlying XML data change, a dynamic summary layer is used to keep exact or more detailed XML path information. We demonstrate through our extensive experiments that the new solution can
XSEED: Accurate and fast cardinality estimation for XPath queries
- In to appear Proc. 22nd Int. Conf. on Data Engineering (ICDE
, 2006
"... We propose XSEED, a synopsis of path queries for cardinality estimation that is accurate, robust, efficient, and adaptive to memory budgets. XSEED starts from a very small kernel, and then incrementally updates information of the synopsis. With such an incremental construction, a synopsis structure ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
We propose XSEED, a synopsis of path queries for cardinality estimation that is accurate, robust, efficient, and adaptive to memory budgets. XSEED starts from a very small kernel, and then incrementally updates information of the synopsis. With such an incremental construction, a synopsis structure can be dynamically configured to accommodate different memory budgets. Cardinality estimation based on XSEED can be performed very efficiently and accurately. Extensive experiments on both synthetic and real data sets show that even with less memory, XSEED could achieve accuracy that is an order of magnitude better than that of other synopsis structures. The cardinality estimation time is under 2 % of the actual querying time for a wide range of queries in all test cases. 1
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
, 2008
"... Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regu ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we will show. The regular expressions occurring in practical DTDs and XSDs, however, are such that every alphabet symbol occurs only a small number of times. As such, in practice it suffices to learn the subclass of regular expressions in which each alphabet symbol occurs at most k times, for some small k. We refer to such expressions as k-occurrence regular expressions (k-OREs for short). Motivated by this observation, we provide a probabilistic algorithm that learns k-OREs for increasing values of k, and selects the one that best describes the sample based on a Minimum Description Length argument. The effectiveness of the method is empirically validated both on real world and synthetic data. Furthermore, the method is shown to be conservative over the simpler classes of expressions considered in previous work.
Searching for Efficient XML-to-Relational Mappings
- In Proc. of 1st Intl. XML Database Symp. (XSym
, 2003
"... Cost-based strategies to derive relational configurations for XML applications have been recently proposed and shown to provide substantially better configurations than heuristic methods. These strategies make use of schema transformations to a canonical schema in order to derive various relation ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Cost-based strategies to derive relational configurations for XML applications have been recently proposed and shown to provide substantially better configurations than heuristic methods. These strategies make use of schema transformations to a canonical schema in order to derive various relational configurations.
An Efficient Infrastructure for Native Transactional XML Processing
- XML PROCESSING, APPEARS IN DATA & KNOWLEDGE ENGINEERING, ELSEVIER
, 2007
"... Implementation techniques for relational database management systems (DBMSs) have proven their efficiency and robustness in many existing systems. However, many of these concepts and mechanisms cannot be used when implementing a native XML DBMS (XDBMS) because of substantial differences in the proce ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Implementation techniques for relational database management systems (DBMSs) have proven their efficiency and robustness in many existing systems. However, many of these concepts and mechanisms cannot be used when implementing a native XML DBMS (XDBMS) because of substantial differences in the processing properties of natively stored XML documents as compared to relational tables. Therefore, we have to develop new and appropriate techniques with ACID transaction guarantees tailored to the processing characteristics of tree documents and the operations on them. For this reason, we want to provide for an efficient infrastructure of XDBMSs consisting of tree node addressing and indexing together with fine-grained locking of tree nodes. In this respect, our prime and novel contribution is to reveal the potential of our prefix-based node labeling called DeweyIDs supporting record addressing, indexing, and locking protocols. In this paper, we first sketch our version of prefix-based node labeling and summarize a quantitative study on them. An overview of our layered XDBMS architecture indicates the concepts and functionalities to be reused from relational DBMS implementations. The core part of the paper describes the infrastructural services for XML document storage with compressed DeweyIDs, the principles and methods for navigational and declarative processing of queries, as well as the lock modes and protocols to enable efficient collaboration. Selected empirical experiments evaluate the XTC system performance and support our system assessment.
Structural selectivity estimation for XML documents
- In ICDE
, 2007
"... Estimating the selectivity of queries is a crucial problem in database systems. Virtually all database systems rely on the use of selectivity estimates to choose amongst the many possible execution plans for a particular query. In terms of XML databases, the problem of selectivity estimation of quer ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Estimating the selectivity of queries is a crucial problem in database systems. Virtually all database systems rely on the use of selectivity estimates to choose amongst the many possible execution plans for a particular query. In terms of XML databases, the problem of selectivity estimation of queries presents new challenges: many evaluation operators are possible, such as simple navigation, structural joins, or twig joins, and many different indexes are possible ranging from traditional B-trees to complicated XML-specific graph indexes. A new synopsis for XML documents is introduced which can be effectively used to estimate the selectivity of complex path queries. The synopsis is based on a lossy compression of the document tree that underlies the XML document, and can be computed in one pass from the document. It has several advantages over existing approaches: (1) it allows one to estimate the selectivity of queries containing all XPath axes, including the order-sensitive ones, (2) the estimator returns a range within which the actual selectivity is guaranteed to lie, with the size of this range implicitly providing a confidence measure of the estimate, and (3) the synopsis can be incrementally updated to reflect changes in the XML database. 1
Towards cost-based optimization for data-intensive Web service computations
- Proceedings of SBBD, 2004
, 2004
"... The recent popularity of XML and Web services has lead to a surge in models and platforms for distributed XML data management applications. This work investigates performance issues involved in the deployment of the ActiveXML (AXML) platform for such applications. AXML documents are XML documents, p ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
The recent popularity of XML and Web services has lead to a surge in models and platforms for distributed XML data management applications. This work investigates performance issues involved in the deployment of the ActiveXML (AXML) platform for such applications. AXML documents are XML documents, part of which is extensional (present in the document), while part is intensional (specified as calls to Web services). Materializing an AXML document thus requires activating all service calls, and gathering the call results in the document. In this work, we demonstrate that many distributed materialization strategies exist for a given AXML document; basically, these strategies may differ in the choice of the peer that executes each service call, or of the peer that makes the call. The AXML system has to choose among the strategies in order to generate efficient materialization plans. We formally characterize the optimization search space, and provide some heuristics to improve plan generation. We describe our optimization model and how it fits in the actual AXML P2P architecture. We describe the possible decision models; the parameters of this model, and the required infrastructure to provide them. Finally, we present empirical results that validate the proposed methodology. 1.

