Results 1  10
of
36
Random Access to GrammarCompressed Strings
, 2011
"... Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is the inverse of the k th row of Ackermann’s function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammarcompressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{P k, k 4 + P } + log N) + occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammarcompressed trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two ”biased” weighted ancestor data structures, and a compact representation of heavypaths in grammars.
Word problems and membership problems on compressed words
 SIAM J. Comput., 35(5):1210
"... Abstract. We consider a compressed form of the word problem for finitely presented monoids, where the input consists of two compressed representations of words over the generators of a monoid M, and we ask whether these two words represent the same monoid element of M. Words are compressed using str ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We consider a compressed form of the word problem for finitely presented monoids, where the input consists of two compressed representations of words over the generators of a monoid M, and we ask whether these two words represent the same monoid element of M. Words are compressed using straightline programs, i.e., contextfree grammars that generate exactly one word. For several classes of finitely presented monoids we obtain completeness results for complexity classes in the range from P to EXPSPACE. As a byproduct of our results on compressed word problems we obtain a fixed deterministic contextfree language with a PSPACEcomplete compressed membership problem. The existence of such a language was open so far. Finally, we will investigate the complexity of the compressed membership problem for various circuit complexity classes. Key words. grammarbased compression, word problems for monoids, contextfree languages, complexity AMS subject classifications. 20F10, 68Q17, 68Q42
Structural selectivity estimation for XML documents
 In ICDE
, 2007
"... Estimating the selectivity of queries is a crucial problem in database systems. Virtually all database systems rely on the use of selectivity estimates to choose amongst the many possible execution plans for a particular query. In terms of XML databases, the problem of selectivity estimation of quer ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Estimating the selectivity of queries is a crucial problem in database systems. Virtually all database systems rely on the use of selectivity estimates to choose amongst the many possible execution plans for a particular query. In terms of XML databases, the problem of selectivity estimation of queries presents new challenges: many evaluation operators are possible, such as simple navigation, structural joins, or twig joins, and many different indexes are possible ranging from traditional Btrees to complicated XMLspecific graph indexes. A new synopsis for XML documents is introduced which can be effectively used to estimate the selectivity of complex path queries. The synopsis is based on a lossy compression of the document tree that underlies the XML document, and can be computed in one pass from the document. It has several advantages over existing approaches: (1) it allows one to estimate the selectivity of queries containing all XPath axes, including the ordersensitive ones, (2) the estimator returns a range within which the actual selectivity is guaranteed to lie, with the size of this range implicitly providing a confidence measure of the estimate, and (3) the synopsis can be incrementally updated to reflect changes in the XML database. 1
Monadic secondorder unification is NPcomplete
 In RTA’04, volume 3091 of LNCS
, 2004
"... Abstract. Bounded SecondOrder Unification is the problem of deciding, for a given secondorder equation t? = u and a positive integer m, whether there exists a unifier σ such that, for every secondorder variable F, the terms instantiated for F have at most m occurrences of every bound variable. I ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Abstract. Bounded SecondOrder Unification is the problem of deciding, for a given secondorder equation t? = u and a positive integer m, whether there exists a unifier σ such that, for every secondorder variable F, the terms instantiated for F have at most m occurrences of every bound variable. It is already known that Bounded SecondOrder Unification is decidable and NPhard, whereas general SecondOrder Unification is undecidable. We prove that Bounded SecondOrder Unification is NPcomplete, provided that m is given in unary encoding, by proving that a sizeminimal solution can be represented in polynomial space, and then applying a generalization of Plandowski’s polynomial algorithm that compares compacted terms in polynomial time. 1
Context matching for compressed terms
 In LICS 2008
, 2008
"... This paper is an investigation of the matching problem for term equations s = t where s contains context variables, and both terms s and t are given using some kind of compressed representation. In this setting, term representation with dags, but also with the more general formalism of singleton t ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
This paper is an investigation of the matching problem for term equations s = t where s contains context variables, and both terms s and t are given using some kind of compressed representation. In this setting, term representation with dags, but also with the more general formalism of singleton tree grammars, are considered. The main result is a polynomial time algorithm for context matching with dags, when the number of different context variables is fixed for the problem. NPcompleteness is obtained when the terms are represented using singleton tree grammars. The special cases of firstorder matching and also unification with STGs are shown to be decidable in PTIME. 1
Xquec: A queryconscious compressed xml database
 ACM Trans. Internet Tech
"... XML compression has gained prominence recently because it counters the disadvantage of the “verbose ” representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where querie ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
XML compression has gained prominence recently because it counters the disadvantage of the “verbose ” representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where queries must be run over compressed documents, compression may not be beneficial since the performance penalty in running the query processor over compressed data outweighs the data compression benefits. While balancing the interests of compression and query processing has received significant attention in the domain of relational databases, these results do not immediately translate to XML data. In this paper, we address the problem of embedding compression into XML databases without degrading query performance. Since the setting is rather different from relational databases, the choice of compression granularity and compression algorithms must be revisited. Query execution in the compressed domain must also be rethought in the framework of XML query processing, due to the richer structure of XML data. Indeed, a proper storage design for the compressed data plays a crucial role here. The XQueC system (standing for XQuery Processor and C ompressor) covers a wide set of
Stratified context unification is npcomplete
 In Proc. of the 3rd International Joint Conference on Automated Reasoning, IJCAR’06
, 2006
"... Abstract. Context Unification is the problem to decide for a given set of secondorder equations E where all secondorder variables are unary, whether there exists a unifier, such that for every secondorder variable X, theabstractionλx.r instantiated for X has exactly one occurrence of the bound va ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Context Unification is the problem to decide for a given set of secondorder equations E where all secondorder variables are unary, whether there exists a unifier, such that for every secondorder variable X, theabstractionλx.r instantiated for X has exactly one occurrence of the bound variable x in r. Stratified Context Unification is a specialization where the nesting of secondorder variables in E is restricted. It is already known that Stratified Context Unification is decidable, NPhard, and in PSPACE, whereas the decidability and the complexity of Context Unification is unknown. We prove that Stratified Context Unification is in NP by proving that a sizeminimal solution can be represented in a singleton tree grammar of polynomial size, and then applying a generalization of Plandowski’s polynomial algorithm that compares compacted terms in polynomial time. This also demonstrates the high potential of singleton tree grammars for optimizing programs maintaining large terms. A corollary of our result is that solvability of rewrite constraints is NPcomplete. 1
Parameter reduction in grammarcompressed trees
 In 12th FoSSaCS, volume 5504 of LNCS
, 2009
"... Abstract. Trees can be conveniently compressed with linear straightline contextfree tree grammars. Such grammars generalize straightline contextfree string grammars which are widely used in the development of algorithms that execute directly on compressed structures (without prior decompression) ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Trees can be conveniently compressed with linear straightline contextfree tree grammars. Such grammars generalize straightline contextfree string grammars which are widely used in the development of algorithms that execute directly on compressed structures (without prior decompression). It is shown that every linear straightline contextfree tree grammar can be transformed in polynomial time into a monadic (and linear) one. A tree grammar is monadic if each nonterminal uses at most one context parameter. Based on this result, a polynomial time algorithm is presented for testing whether a given nondeterministic tree automaton with sibling constraints accepts a tree given by a linear straightline contextfree tree grammar. It is shown that if tree grammars are nondeterministic or nonlinear, then reducing their numbers of parameters cannot be done without an exponential blowup in grammar size. 1
S.: XML tree structure compression
 In: XANTEC
"... In an XML document a considerable fraction consists of markup, that is, begin and endelement tags describing the document’s tree structure. XML compression tools such as XMill separate the tree structure from the data content and compress each separately. The main focus in these compression tools i ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
In an XML document a considerable fraction consists of markup, that is, begin and endelement tags describing the document’s tree structure. XML compression tools such as XMill separate the tree structure from the data content and compress each separately. The main focus in these compression tools is how to group similar data content together prior to performing standard data compression such as gzip, bzip2, or ppm. In contrast, the focus of this paper is on compressing the tree structure part of an XML document. We use a known algorithm to derive a grammar representation of the tree structure which factors out the repetition of tree patterns. We then investigate several succinct binary encodings of these grammars. Our experiments show that we can be consistently smaller than the tree structure compression carried out by XMill, using the same backend compressors as XMill on our encodings. However, the most surprising result is that our own Huffmanlike encoding of the grammars (without any backend compressor whatsoever) consistently outperforms XMill with gzip backend. This is of particular interest because our Huffmannlike encoding can be queried without prior decompression. To the best of our knowledge this offers the smallest queriable XML tree structure representation currently available. 1