Results 11 - 20
of
179
Query Containment for Conjunctive Queries With Regular Expressions
"... The management of semistructured data has recently received significant attention because of the need of several applications to model and query large volumes of irregular data. This paper considers the problem of query containment for a query language over semistructured data, StruQL0 , that contai ..."
Abstract
-
Cited by 65 (9 self)
- Add to MetaCart
The management of semistructured data has recently received significant attention because of the need of several applications to model and query large volumes of irregular data. This paper considers the problem of query containment for a query language over semistructured data, StruQL0 , that contains the essential feature common to all such languages, namely the ability to specify regular path expressions over the data. We show here that containment of StruQL0 queries is decidable. First, we give a semantic criterion for StruQL0 query containment: we show that it suffices to check containment on only finitely many canonical databases. Second, we give a syntactic criteria for query containment, based on a notion of query mappings, which extends containment mappings for conjunctive queries. Third, we consider a certain fragment of StruQL0 , obtained by imposing restrictions on the regular path expressions, and show that query containment for this fragment of StruQL0 is NP complete. 1 ...
Secure XML Querying with Security Views
, 2004
"... The prevalent use of XML highlights the need for a generic, flexible access-control mechanism for XML documents that supports efficient and secure query access, without revealing sensitive information to unauthorized users. This paper introduces a novel paradigm for specifying XML security constrain ..."
Abstract
-
Cited by 62 (6 self)
- Add to MetaCart
The prevalent use of XML highlights the need for a generic, flexible access-control mechanism for XML documents that supports efficient and secure query access, without revealing sensitive information to unauthorized users. This paper introduces a novel paradigm for specifying XML security constraints and investigates the enforcement of such constraints during XML query evaluation. Our approach is based on the novel concept of security views, which provide for each user group (a) an XML view consisting of all and only the information that the users are authorized to access, and (b) a view DTD that the XML view conforms to. Security views effectively protect sensitive data from access and potential inferences by unauthorized users, and provide authorized users with necessary schema information to facilitate effective query formulation and optimization. We propose an efficient algorithm for deriving security view definitions from security policies (defined on the original document DTD) for different user groups. We also develop novel algorithms for XPath query rewriting and optimization such that queries over security views can be efficiently answered without materializing the views. Our algorithms transform a query over a security view to an equivalent query over the original document, and effectively prune query nodes by exploiting the structural properties of the document DTD in conjunction with approximate XPath containment tests. Our work is the first to study a flexible, DTD-based access-control model for XML and its implications on the XML query-execution engine. Furthermore, it is among the first efforts for query rewriting and optimization in the presence of general DTDs for a rich class of XPath queries. An empirical study based on real-life DTDs verifies the ef...
Efficient Structural Joins on Indexed XML Documents
- In VLDB
, 2002
"... Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, parent child and ancestor descendant relationships. Efficient support for structural joins is thus the key to efficient implementations of XML queries. Recently proposed node numbering schemes enable the capturing of the XML document structure us- ing traditional indices (such as B+-trees or R-trees). This paper proposes efficient struc- tural join algorithms in the presence of tag indices. We first concentrate on using B+- trees and show how to expedite a structural join by avoiding collections of elements that do not participate in the join. We then intro- duce an enhancement (based on sibling pointers) that further improves performance. Such sibling pointers are easily implemented and dynamically maintainable. We also present a structural join algorithm that utilizes R-trees.
Semistructured Data and XML
, 1998
"... This paper argues that the research on semistructured data is receiving a new set of challenges with the advent of XML (Extensible Mark-up Language [Bos97, Con98]). This is a new standard approved by the World Wide Web Consortium that many believe will become the de facto data exchange format for th ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
This paper argues that the research on semistructured data is receiving a new set of challenges with the advent of XML (Extensible Mark-up Language [Bos97, Con98]). This is a new standard approved by the World Wide Web Consortium that many believe will become the de facto data exchange format for the Web. XML supports the electronic exchange of machine-readable data (while HTML is designed primarily for human-readable documents). XML data shares many features of semistructured data: its structure can be irregular, is not always known ahead of time, and may change frequently and without notice. On the other hand it is easy to convert data from any source into XML which will make it attractive for organizations to "publish" their information sources in XML, and thus make them available to other XML applications on the Web. For XML applications to reach their full potential however, we need to build the right tools to process data in this new format. Existing Web tools (browsers, search engines) are oriented toward document operations . For XML we need database operations , like data extraction, data integration, data translation, data storage. The research done so far on semistructured data may offer some solutions to the database problems posed by XML. For example the recently proposed query language for XML, called XML-QL [DFF
Answering regular path queries using views
- IN PROC. OF THE 16TH IEEE INT. CONF. ON DATA ENGINEERING (ICDE
, 2000
"... Query answering using views amounts to computing the answer to a query having information only on the extension of a set of views. This problem is relevant in several fields, such as information integration, data warehousing, query optimization, mobile computing, and maintaining physical data indepe ..."
Abstract
-
Cited by 55 (23 self)
- Add to MetaCart
Query answering using views amounts to computing the answer to a query having information only on the extension of a set of views. This problem is relevant in several fields, such as information integration, data warehousing, query optimization, mobile computing, and maintaining physical data independence. We address query answering using views in a context where queries and views are regular path queries, i.e., regular expressions that denote the pairs of objects in the database connected by a matching path. Regular path queries are the basic query mechanism when the database is conceived as a graph, such as in semistructured data and data on the web. We study algorithms for answering regular path queries using views under different assumptions, namely, closed and open domain, and sound, complete, and exact information on view extensions. We characterize data, expression, and combined complexity of the problem, showing that the proposed algorithms are essentially optimal. Our results are the first to exhibit decidability in cases where the language for expressing the query and the views allows for recursion.
An XML Query Engine for Network-Bound Data
, 2001
"... XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has rem ..."
Abstract
-
Cited by 54 (7 self)
- Add to MetaCart
XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources -- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output --- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query's input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability.
Accelerating XPath Evaluation in any RDBMS
- ACM Transactions on Database Systems
, 2003
"... This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, et ..."
Abstract
-
Cited by 50 (8 self)
- Add to MetaCart
This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, etc.). This feature lets the index stand out among related work on XML indexing structures which had a focus on the child and descendant axes only. The index has been designed with a close eye on the XPath semantics as well as the desire to engineer its internals so that it can be supported well by existing relational database query processing technology: the index (a) permits set-oriented (or, rather, sequence-oriented) path evaluation, and (b) can be implemented and queried using well-established relational index structures, notably B-trees and R-trees
On the Integration of Structure Indexes and Inverted Lists
- In SIGMOD
, 2004
"... Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions ove ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions over graph/tree-structured XML data. These methods can be classified into two broad classes. The first involves graph traversal where the input query is evaluated by traversing the data graph or some compressed representation. The other class involves information-retrieval style processing using inverted lists. In this framework, structure indexes have been proposed to be used as a substitute for graph traversal. These structure indexes are proven to be very effective when applied to queries that examine the “coarse ” structure of documents. For example, for many
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
- In ICDE
, 2002
"... XML and other semi-structured data may have partially specified or missing schema information, motivating the use of a structural summary which can be automatically computed from the data. These summaries also serve as indices for evaluating the complex path expressions common to XML and semi-struct ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
XML and other semi-structured data may have partially specified or missing schema information, motivating the use of a structural summary which can be automatically computed from the data. These summaries also serve as indices for evaluating the complex path expressions common to XML and semi-structured query languages. However, to answer all path queries accurately, summaries must encode information about long, seldom-queried paths, leading to increased size and complexity with little added value. We introduce the A(k)-indices, a family of approximate structural summaries. They are based on the concept of k-bisimilarity, in which nodes are grouped based on local structure, i.e., the incoming paths of length up to k. The parameter k thus smoothly varies the level of detail (and accuracy) of the A(k)-index. For small values of k, the size of the index is substantially reduced. While smaller, the A(k) index is approximate, and we describe techniques for efficiently extracting exact answers to regular path queries. Our experiments show that, for moderate values of k, path evaluation using the A(k)-index ranges from being very efficient for simple queries to competitive for most complex queries, while using significantly less space than comparable structures. 1.

