Results 1  10
of
31
ViST: A Dynamic Index Method for Querying XML Data by Tree Structures
 In SIGMOD
, 2003
"... much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structureencoded sequences, we show that query ..."
Abstract

Cited by 93 (6 self)
 Add to MetaCart
much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structureencoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple subqueries, and then join the results of these subqueries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is e#ective, scalable, and e#cient in supporting structural queries.
Labeling Dynamic XML Trees
, 2002
"... We present algorithms to label the nodes of an XML tree which is subject to insertions and deletions of nodes. The labeling is done such that (1) we label each node immediately when it is inserted and this label remains unchanged, and (2) from a pair of labels alone, we can decide whether one node ..."
Abstract

Cited by 87 (2 self)
 Add to MetaCart
We present algorithms to label the nodes of an XML tree which is subject to insertions and deletions of nodes. The labeling is done such that (1) we label each node immediately when it is inserted and this label remains unchanged, and (2) from a pair of labels alone, we can decide whether one node is an ancestor of the other. This problem arises in the context of XML databases that support queries on the structure of the documents as well as on the changes made to the documents over time. We prove that our algorithms assign the shortest possible labels (up to a constant factor) which satisfy these requirements. We also consider the same problem when "clues " that provide guarantees on possible future insertions are given together with newly inserted nodes. Such clues can be derived from the DTD or from statistics on similar XML trees. We present algorithms that use the clues to assign shorter labels. We also prove that the length of our labels is close to the minimum possible.
Nearest Common Ancestors: A survey and a new distributed algorithm
, 2002
"... Several papers describe linear time algorithms to preprocess a tree, such that one can answer subsequent nearest common ancestor queries in constant time. Here, we survey these algorithms and related results. A common idea used by all the algorithms for the problem is that a solution for complete ba ..."
Abstract

Cited by 76 (12 self)
 Add to MetaCart
Several papers describe linear time algorithms to preprocess a tree, such that one can answer subsequent nearest common ancestor queries in constant time. Here, we survey these algorithms and related results. A common idea used by all the algorithms for the problem is that a solution for complete balanced binary trees is straightforward. Furthermore, for complete balanced binary trees we can easily solve the problem in a distributed way by labeling the nodes of the tree such that from the labels of two nodes alone one can compute the label of their nearest common ancestor. Whether it is possible to distribute the data structure into short labels associated with the nodes is important for several applications such as routing. Therefore, related labeling problems have received a lot of attention recently.
HOPI: An efficient connection index for complex XML document collections
 In 9th Int. Conference on Extending Database Technology (EDBT
, 2004
"... Abstract. In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nod ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
Abstract. In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space – and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.
Labeling schemes for small distances in trees
 In Proceedings of the ACMSIAM Symposium on Discrete Algorithms
, 2003
"... Abstract. We consider labeling schemes for trees, supporting various relationships between nodes at small distance. For instance, we show that given a tree T and an integer k we can assign labels to each node of T such that given the label of two nodes we can decide, from these two labels alone, if ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
Abstract. We consider labeling schemes for trees, supporting various relationships between nodes at small distance. For instance, we show that given a tree T and an integer k we can assign labels to each node of T such that given the label of two nodes we can decide, from these two labels alone, if the distance between v and w is at most k and if so compute it. For trees with n nodes and k ≥ 2, we give a lower bound on the maximum label length of log n + Ω(log log n) bits, and for constant k, we give an upper bound of log n+O(log log n). Bounds for ancestor, sibling, connectivity and bi and triconnectivity labeling schemes are also presented. Key words. Labeling schemes, trees. AMS subject classifications. 68R10, 68W01
Adaptive searching in succinctly encoded binary relations and treestructured documents (Extended Abstract)
 THEORETICAL COMPUTER SCIENCE
, 2005
"... This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multilabeled trees, where several labels can be assigned to each node of a tree. To find th ..."
Abstract

Cited by 26 (9 self)
 Add to MetaCart
This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multilabeled trees, where several labels can be assigned to each node of a tree. To find the set of references corresponding to a set of keywords, one typically intersects the list of references associated with each keyword. We view this instead as having a single list of objects [n] = {1,..., n} (the references), each of which has a subset of the labels [σ] = {1,..., σ} (the keywords) associated with it. We are able to find the objects associated with an arbitrary set of keywords in time O(δk lg lg σ) using a data structure requiring only t(lg σ +o(lg σ)) bits, where δ is the number of steps required by a nondeterministic algorithm to check the answer, k is the number of keywords in the query, σ is the size of the set from which the keywords are chosen, and t is the number of associations between references and keywords. The data structure is succinct in that it differs from the space needed to write down all t occurrences of keywords by only a lower order term. An XML document is, for our purpose, a labeled rooted tree. We deal primarily with “nonrecursive labeled trees”, where no label occurs more than once on any root to leaf path. We find the set of nodes which path from the root include a set of keywords in the same time, O(δk lg lg σ), on a representation of the tree using essentially minimum space, 2n + n(lg σ + o(lg σ)) bits, where n is the number of nodes in the tree. If we permit nodes to have multiple
Yifeng Zheng, BLAS: an efficient XPath processing system
 Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD
, 2004
"... We present BLAS, a BiLAbeling based System, for efficiently processing complex XPath queries over XML data. BLAS uses Plabeling to process queries involving consecutive child axes, and Dlabeling to process queries involving descendant axes traversal. The XML data is stored in labeled form, and in ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We present BLAS, a BiLAbeling based System, for efficiently processing complex XPath queries over XML data. BLAS uses Plabeling to process queries involving consecutive child axes, and Dlabeling to process queries involving descendant axes traversal. The XML data is stored in labeled form, and indexed to optimize descendent axis traversals. Three algorithms are presented for translating complex XPath queries to SQL expressions, and two alternate query engines are provided. Experimental results demonstrate that the BLAS system has a substantial performance improvement compared to traditional XPath processing using Dlabeling. 1.
On the Sequencing of Tree Structures for XML Indexing
, 2003
"... Sequencebased XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. In this paper, we address the problem of query equivalence with respect t ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Sequencebased XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. In this paper, we address the problem of query equivalence with respect to this transformation, and we introduce a performanceoriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, postprocessing, or other special handling for problems such as false alarms. We identify a class of sequencing methods for this purpose, and we present a novel subsequence matching algorithm that observe query equivalence. Still, query equivalence is just a prerequisite for sequencebased XML indexing. Our goal is to find the best sequencing strategy with regard to the time and space complexity in indexing and querying XML data. To this end, we introduce a performanceoriented principle to guide the sequencing of tree structures. For any given XML dataset, the principle finds an optimal sequencing strategy according to its schema and its data distribution. We present a novel method that realizes this principle. In our experiments, we show the advantages of sequencebased indexing over traditional XML indexing methods, and we compare several sequencing strategies and demonstrate the benefit of the performanceoriented sequencing principle.
Labeling Schemes for Dynamic Tree Networks
 Theory of Computing Systems
, 2002
"... Distance labeling schemes are composed of a marker algorithm for labeling the vertices of a graph with short labels, coupled with a decoder algorithm allowing one to compute the distance between any two vertices directly from their labels (without using any additional information). As applications f ..."
Abstract

Cited by 16 (12 self)
 Add to MetaCart
Distance labeling schemes are composed of a marker algorithm for labeling the vertices of a graph with short labels, coupled with a decoder algorithm allowing one to compute the distance between any two vertices directly from their labels (without using any additional information). As applications for distance labeling schemes concern mainly large and dynamically changing networks, it is of interest to study distributed dynamic labeling schemes. The current paper considers the problem on dynamic trees, and proposes efficient distributed schemes for it. The paper first presents a labeling scheme for distances in the dynamic tree model, with amortized message complexity O(log 2 n) per operation, where n is the size of the tree at the time the operation takes place. The protocol maintains O(log 2 n) bit labels. This label size is known to be optimal even in the static scenario. A more general labeling scheme is then introduced for the dynamic tree model, based on extending an existing static tree labeling scheme to the dynamic setting. The approach fits a number of natural tree functions, such as distance, separation level and flow. The main resulting scheme incurs an overhead of a O(log n) multiplicative factor in both the label size and amortized message complexity in the case of dynamically growing trees (with no vertex deletions). If an upper bound on n is known in advance, this method yields a different tradeoff, with an O(log 2 n / log log n) multiplicative overhead on the label size but only an O(log n / log log n) overhead on the amortized message complexity. In the fullydynamic model the scheme incurs also an increased additive overhead in amortized communication, of O(log 2 n) messages per operation.
Labeling Schemes for Weighted Dynamic Trees
 In Proc. 30th Int. Colloq. on Automata, Languages & Prog
, 2003
"... A Distance labeling scheme is a type of localized network representation in which short labels are assigned to the vertices, allowing one to infer the distance between any two vertices directly from their labels, without using any additional information sources. As most applications for network repr ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
A Distance labeling scheme is a type of localized network representation in which short labels are assigned to the vertices, allowing one to infer the distance between any two vertices directly from their labels, without using any additional information sources. As most applications for network representations in general, and distance labeling schemes in particular, concern large and dynamically changing networks, it is of interest to focus on distributed dynamic labeling schemes. The paper considers dynamic weighted trees where the vertices of the trees are fixed but the (positive integral) weights of the edges may change. The two models considered are the edgedynamic model, where from time to time some edge changes its weight by a fixed quanta, and the increasingdynamic model in which edge weights can only grow. The paper presents distributed approximate distance labeling schemes for the two dynamic models, which are efficient in terms of the required label size and communication complexity involved in updating the labels following the weight changes.