Results 1 - 10
of
11
From region encoding to extended dewey: On efficient processing of xml twig pattern matching
- In VLDB
, 2005
"... Finding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process a twig query based on region encoding labeling scheme. While region encoding supports efficient determination of ancestor ..."
Abstract
-
Cited by 47 (10 self)
- Add to MetaCart
Finding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process a twig query based on region encoding labeling scheme. While region encoding supports efficient determination of ancestor-descendant (or parent-child) relationship between two elements, we observe that the information within a single label is very limited. In this paper, we propose a new labeling scheme, called extended Dewey. This is a powerful labeling scheme, since from the label of an element alone, we can derive all the elements names along the path from the root to the element. Based on extended Dewey, we design a novel holistic twig join algorithm, called TJ-Fast. Unlike all previous algorithms based on region encoding, to answer a twig query, TJ-Fast only needs to access the labels of the leaf query nodes. Through this, not only do we reduce disk access, but we also support the efficient evaluation of queries with wildcards in branching nodes, which is very difficult to be answered by algorithms based on region encoding. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.
Extending XPath to support linguistic queries
- In: Workshop on Programming Language Technologies for XML (PLAN-X
, 2005
"... Linguistic research and language technology development employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for storing and querying linguistic data. However, several important expressive features required for lingui ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Linguistic research and language technology development employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for storing and querying linguistic data. However, several important expressive features required for linguistic queries are missing in XPath. In this paper, we motivate and illustrate these features with a variety of linguistic queries. Then we define extensions to XPath which support linguistic tree queries. We provide a relational representation for trees, and define an SQL translation for queries. Experiments demonstrate that the query system is significantly faster than other linguistic tree query systems for a wide range of queries. 1
QED: a novel quaternary encoding to completely avoid re-labeling in XML updates
- In Proc. of CIKM
, 2005
"... The method of assigning labels to the nodes of the XML tree is called a labeling scheme. Based on the labels only, both ordered and un-ordered queries can be processed without accessing the original XML file. One more important point for the labeling scheme is the label update cost in inserting or d ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
The method of assigning labels to the nodes of the XML tree is called a labeling scheme. Based on the labels only, both ordered and un-ordered queries can be processed without accessing the original XML file. One more important point for the labeling scheme is the label update cost in inserting or deleting a node into or from the XML tree. All the current labeling schemes have high update cost, therefore in this paper we propose a novel quaternary encoding approach for the labeling schemes. Based on this encoding approach, we need not re-label any existing nodes when the update is performed. Extensive experimental results on the XML datasets illustrate that our QED works much better than the existing labeling schemes on the label updates when considering either the number of nodes or the time for re-labeling.
Value-based notification conditions in large-scale publish/subscribe systems. InVLDB
, 2007
"... We address the problem of providing scalable support for subscriptions with personalized value-based notification conditions in widearea publish/subscribe systems. Notification conditions can be finetuned by subscribers, allowing precise and flexible control of when events are delivered to the subsc ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
We address the problem of providing scalable support for subscriptions with personalized value-based notification conditions in widearea publish/subscribe systems. Notification conditions can be finetuned by subscribers, allowing precise and flexible control of when events are delivered to the subscribers. For example, a user may specify that she should be notified if and only if the price of a particular stock moves outside a “radius ” around her last notified value. Naive techniques for handling notification conditions are not scalable. It is challenging to share subscription processing and notification dissemination of subscriptions with personalized valuebased notification conditions, because two subscriptions may see two completely different sequences of notifications even if they specify the same radius. We develop and experimentally evaluate scalable processing and dissemination techniques for these subscriptions. Our approach uses standard network substrates for notification dissemination, and avoids pushing complex application processing into the network. Compared with other alternatives, our approach generates orders of magnitude lower network traffic, and incurs lower server processing cost. 1
Efficient Processing of Updates in Dynamic XML data
- Proceedings of the International Conference on Data Engineering
, 2006
"... It is important to process the updates when nodes are inserted into or deleted from the XML tree. All the existing labeling schemes have high update cost, thus in this paper we propose a novel Compact Dynamic Binary String (CDBS) encoding to efficiently process the updates. CDBS has two important pr ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
It is important to process the updates when nodes are inserted into or deleted from the XML tree. All the existing labeling schemes have high update cost, thus in this paper we propose a novel Compact Dynamic Binary String (CDBS) encoding to efficiently process the updates. CDBS has two important properties which form the foundations of this paper: (1) CDBS supports that codes can be inserted between any two consecutive CDBS codes with the orders kept and without re-encoding the existing codes; (2) CDBS is orthogonal to specific labeling schemes, thus it can be applied broadly to different labeling schemes or other applications to efficiently process the updates. We report our experimental results to show that our CDBS is superior to previous approaches to process updates in terms of the number of nodes to re-label and the time for updating. 1.
DeweyIDs -- The Key to Fine-Grained Management of XML Documents
- IN PROC. 20TH BRASILIAN SYMPOSIUM ON DATABASES
, 2005
"... Because XML documents tend to be very large and are more and more collaboratively processed, their fine-grained storage and management is a must for which, in turn, a flexible tree representation is mandatory. Performance requirements dictate efficient query and update processing in multi-user env ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Because XML documents tend to be very large and are more and more collaboratively processed, their fine-grained storage and management is a must for which, in turn, a flexible tree representation is mandatory. Performance requirements dictate efficient query and update processing in multi-user environments. For this reason, three aspects are of particular importance: index support to directly access each internal document node if needed, navigation along the parent, child, and sibling axes, selective and direct locking of minimal document granules. The secret to effectively accelerate all of them are DeweyIDs. They identify the tree nodes, avoid relabeling of them even under heavy node insertions and deletions, and allow, at the same time, the derivation of all ancestor node IDs without accessing the document. In this paper, we explore the concept of DeweyIDs, refine the ORDPATH addressing scheme, illustrate its implementation, and give an exhaustive performance evaluation of its practical use.
Compact reachability labeling for graph-structured data
- In Proceedings of the 2005 ACM International Conference on Information and Knowledge Management (CIKM
, 2004
"... Testing reachability between nodes in a graph is a well-known problem with many important applications, including knowledge representation, program analysis, and more recently, biological and ontology databases inferencing as well as XML query processing. Various approaches have been proposed to enc ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Testing reachability between nodes in a graph is a well-known problem with many important applications, including knowledge representation, program analysis, and more recently, biological and ontology databases inferencing as well as XML query processing. Various approaches have been proposed to encode graph reachability information using node labeling schemes, but most existing schemes only work well for specific types of graphs. In this paper, we propose a novel approach, HLSS(Hybrid Labeling of Sub-Structures), which identifies different types of substructures within a graph and encodes them using techniques suitable to the characteristics of each of them. We implement HLSS with an efficient two-phase algorithm, where the first phase identifies and encodes strongly connected components as well as tree substructures, and the second phase encodes the remaining reachability relationships by compressing dense rectangular submatrices in the transitive closure matrix. For the important subproblem of finding densest submatrices, we demonstrate the hardness of the problem and propose several practical algorithms. Experiments show that HLSS handles different types of graphs well, while existing approaches fall prey to graphs with substructures they are not designed to handle. Finally, we also discuss how to update reachability labels when the graph is updated, and qualitatively show that HLSS supports more efficient updates than existing approaches. 1
Lazy xml updates: Laziness as a virtue of update and structural join efficiency
- In SIGMOD
, 2005
"... XML documents are normally stored as plain text files. Hence, the natural and most convenient way to update XML documents is to simply edit the text files. But efficient query evaluation algorithms require XML documents to be indexed. Every element is given a unique identifier based on its location ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
XML documents are normally stored as plain text files. Hence, the natural and most convenient way to update XML documents is to simply edit the text files. But efficient query evaluation algorithms require XML documents to be indexed. Every element is given a unique identifier based on its location in the document or its preorder-traversal order, and this identifier is later used as (part of) the key in the index. Reassigning orders of possibly a large number of elements is therefore necessary when the original XML documents are updated. Immutable dynamic labeling schemes have been proposed to solve this problem, that, however, require very long labels and may decrease query performance. If we consider a real-world scenario, we note that many relatively small ad-hoc XML segments are inserted/deleted into/from an existing XML database. In this paper, we start from this consideration and we propose a new lazy approach to handle XML updates that also improves query performance. The lazy approach: (i) completely avoids reassigning existing element orders after updates; (ii) improves query processing by taking advantages from segments. Experimental results show that our approach is much more efficient in handling updates than using immutable labeling and, at the same time, it also improves the performance of recently defined structural join algorithms. 1.
DeweyIDs—The Key to Fine-Grained Management of XML Documents
- In Proc. 20th Brasilian Symposium on Databases
, 2005
"... Abstract. Because XML documents tend to be very large and are more and more collaboratively processed, their fine-grained storage and management is a must for which, in turn, a flexible tree representation is mandatory. Performance requirements dictate efficient query and update processing in multi- ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. Because XML documents tend to be very large and are more and more collaboratively processed, their fine-grained storage and management is a must for which, in turn, a flexible tree representation is mandatory. Performance requirements dictate efficient query and update processing in multi-user environments. For this reason, three aspects are of particular importance: index support to directly access each internal document node if needed, navigation along the parent, child, and sibling axes, selective and direct locking of minimal document granules. The secret to effectively accelerate all of them are DeweyIDs. They identify the tree nodes, avoid relabeling of them even under heavy node insertions and deletions, and allow, at the same time, the derivation of all ancestor node IDs without accessing the document. In this paper, we explore the concept of DeweyIDs, refine the ORDPATH addressing scheme, illustrate its implementation, and give an exhaustive performance evaluation of its practical use. 1
XML Document
"... this article identifies current practices and trends, offering insight into how developers can improve query processing and select the best solution for particular contexts ..."
Abstract
- Add to MetaCart
this article identifies current practices and trends, offering insight into how developers can improve query processing and select the best solution for particular contexts

