## Complexity of decision problems for simple regular expressions (2004)

Venue: | IN PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE (MFCS 2004 |

Citations: | 26 - 12 self |

### BibTeX

@INPROCEEDINGS{Martens04complexityof,

author = {Wim Martens and Frank Neven and Thomas Schwentick},

title = {Complexity of decision problems for simple regular expressions},

booktitle = {IN PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE (MFCS 2004},

year = {2004},

pages = {889--900},

publisher = {Springer}

}

### OpenURL

### Abstract

We study the complexity of the inclusion, equivalence, and intersection problem for simple regular expressions arising in practical XML schemas. These basically consist of the concatenation of factors where each factor is a disjunction of strings possibly extended with ‘∗ ’ or ‘?’. We obtain lower and upper bounds for various fragments of simple regular expressions. Although we show that inclusion and intersection are already intractable for very weak expressions, we also identify some tractable cases. For equivalence, we only prove an initial tractability result leaving the complexity of more general cases open. The main motivation for this research comes from database theory, or more specifically XML and semi-structured data. We namely show that all lower and upper bounds for inclusion and equivalence, carry over to the corresponding decision problems for extended context-free grammars and single-type tree grammars, which are abstractions of DTDs and XML Schemas, respectively. For intersection, we show that the complexity only carries over for DTDs.

### Citations

320 |
Word problems requiring exponential time”, STOC
- Stockmeyer, Meyer
- 1973
(Show Context)
Citation Context ...ressions are summarized in Table 2. We denote by RE(S), the set of all simple regular expressions. Recall that the three decision problems are pspace-complete for the class of all regular expressions =-=[13, 24]-=-. We briefly discuss our results. – We show that inclusion is already conp-complete for very innocent expressions: where every factor is of the form a or a ∗ , or of the form a or a? with a an arbitra... |

303 | Index structures for path expressions
- Milo, Suciu
- 1999
(Show Context)
Citation Context ...he latter is much more general than, e.g., our RE(w ∗ ). More recently, two fragments of simple regular expressions have been shown to be tractable: inclusion for RE(a?,(+a) ∗ ) [1], and RE(a,Σ,Σ ∗ ) =-=[17]-=-. This last result should be contrasted with the pspacecompleteness of inclusion for RE(a,(+a),(+a) ∗ ). We conclude by a remark on one-unambiguous or deterministic regular expressions. Basically, the... |

208 | Taxonomy of XML schema languages using formal language theory
- Murata, Lee, et al.
- 2005
(Show Context)
Citation Context ...h their abbreviated notation). Further, Murata et al. argued that XML Schemas do not correspond to the full class of tree automata or SDTDs, but to a strict subset of those, namely, single-type SDTDs =-=[19]-=-. Clearly, complexity lower bounds for the inclusion, equivalence, or the intersection problem for a class of regular expressions R imply lower bounds for the corresponding decision problems for DTDs ... |

163 | XDuce: A statically typed XML processing language
- Hosoya, Pierce
(Show Context)
Citation Context ...ormat. The presence of such a schema improves the efficiency of many tasks like, for instance, query processing, query optimization, and automatic data integration. For typechecking or type inference =-=[11, 14, 18, 21]-=-, schema information is even crucial. As standard decision problems of schema languages, like inclusion, equivalence, and non-emptiness of intersection, are among the basic building blocks for many of... |

162 | Typechecking for XML transformers
- Milo, Suciu, et al.
- 2000
(Show Context)
Citation Context ...ormat. The presence of such a schema improves the efficiency of many tasks like, for instance, query processing, query optimization, and automatic data integration. For typechecking or type inference =-=[11, 14, 18, 21]-=-, schema information is even crucial. As standard decision problems of schema languages, like inclusion, equivalence, and non-emptiness of intersection, are among the basic building blocks for many of... |

120 | DTD inference for views of XML data
- Papakonstantinou, Vianu
- 2000
(Show Context)
Citation Context ...ormat. The presence of such a schema improves the efficiency of many tasks like, for instance, query processing, query optimization, and automatic data integration. For typechecking or type inference =-=[11, 14, 18, 21]-=-, schema information is even crucial. As standard decision problems of schema languages, like inclusion, equivalence, and non-emptiness of intersection, are among the basic building blocks for many of... |

111 | One-unambiguous regular languages
- Brüggemann-Klein, Wood
- 1998
(Show Context)
Citation Context ...clusion for RE(a,(+a),(+a) ∗ ). We conclude by a remark on one-unambiguous or deterministic regular expressions. Basically, these are regular expressions which have a deterministicsGlushkov automaton =-=[4]-=-. The XML specification requires DTD content models to be deterministic because of compatibility with SGML (Section 3.2.1 of [8]). Of course, for such expressions, inclusion and equivalence are in pti... |

110 | Regular tree and regular hedge languages over unranked alphabets. Unpublished manuscript, version 1
- Brüggemann-Klein, Murata, et al.
- 2001
(Show Context)
Citation Context ...-free grammars with regular expressions as right-hand sides of rules, while the latter are a natural extension of classical tree automata to trees where nodes can have an unbounded number of children =-=[3]-=-. A formalism equivalent to such tree automata but which is grammar based are specialized DTDs (SDTDs) [21]. The complexity of the three afore mentioned problems is known and is pspacecomplete for DTD... |

87 |
Lower Bounds for Natural Proof Systems
- Kozen
- 1977
(Show Context)
Citation Context ...ressions are summarized in Table 2. We denote by RE(S), the set of all simple regular expressions. Recall that the three decision problems are pspace-complete for the class of all regular expressions =-=[13, 24]-=-. We briefly discuss our results. – We show that inclusion is already conp-complete for very innocent expressions: where every factor is of the form a or a ∗ , or of the form a or a? with a an arbitra... |

85 | A Web odyssey: from Codd to XML
- Vianu
(Show Context)
Citation Context ...ition) and XML Schema Definitions (XSDs) [8, 9] are the most widely spread. Generally these languages are abstracted by extended context-free grammars (ECFGs) and unranked tree automata, respectively =-=[20, 26]-=-. The former aresTable 1. Possible factors in simple regular expressions and how they are denoted (a ∈ Σ, w ∈ Σ ∗ ). Factor Abbr. a a a ∗ a ∗ a? a? (a1 + · · · + an) (+a) Factor Abbr. (a1 + · · · + an... |

78 | Deciding equivalence of finite tree automata - Seidl - 1990 |

76 | B.: On-the-fly analysis of systems with unbounded, lossy fifo channels
- Abdulla, Bouajjani, et al.
- 1998
(Show Context)
Citation Context ...should be noted that the latter is much more general than, e.g., our RE(w ∗ ). More recently, two fragments of simple regular expressions have been shown to be tractable: inclusion for RE(a?,(+a) ∗ ) =-=[1]-=-, and RE(a,Σ,Σ ∗ ) [17]. This last result should be contrasted with the pspacecompleteness of inclusion for RE(a,(+a),(+a) ∗ ). We conclude by a remark on one-unambiguous or deterministic regular expr... |

69 |
What are real DTDs like
- Choi
- 2002
(Show Context)
Citation Context ...heless, intersection remains pspace-hard (cf. Theorem 8). Unfortunately, the notion of deterministic content models is not a transparent one for the average user, as is witnessed by practical studies =-=[7, 2]-=- which found a number of non-deterministic content models in actual DTDs. Actually, for this very reason Clarke and Murata abandoned the notion in their Relax NG specification [25]. Hence, from a scie... |

58 | XPath with conditional axis relations
- Marx
- 2004
(Show Context)
Citation Context ... Another consequence of our results, independent of the one-unambiguous issue, is that optimization problems for navigational queries as expressed by caterpillar expressions [5], X CPath reg and Xreg =-=[16]-=-, or regular path queries [6] quickly turn intractable. Due to space limitations many proofs are omitted. We refer the interested reader to [15]. 2 Definitions Regular expressions. For the rest of the... |

55 |
den Bussche. DTDs versus XML Schema: A practical study
- Bex, Neven, et al.
- 2004
(Show Context)
Citation Context ...ardness of inclusion of DTDs crucially depends on the presence of involved regular expressions that are quite unlikely to occur in realistic DTDs. Actually, a study by Bex, Neven, and Van den Bussche =-=[2]-=- reveals that more than 90 percent of the regular expressions occurring in practical DTDs and XSDs are of the following simple form: e1 · · · en where every ei is a factor of the form (w1+· · ·+wn) po... |

44 | Haskell overloading is DEXPTIME-complete
- Seidl
- 1994
(Show Context)
Citation Context ...d in Theorem 2 as by Theorem 7, intersection of RE((+a),w?,(+a)?,(+a) ∗ ) is in np. The reduction is similar to the proof that intersection of deterministic top-down tree automata is exptime-complete =-=[23]-=-. However, single-type DTDs and the latter automata are incomparable. Indeed, the tree language consisting of the trees {a(bc),a(cb)} is not definable by a top-down deterministic tree automaton, while... |

39 |
The Complexity Theory Companion
- Hemaspaandra, Ogihara
- 2002
(Show Context)
Citation Context ...ith oracle O (denoted L ′ = L(M O )). Let M further have the property that L(M A ) ⊆ L(M B ) whenever A ⊆ B. Then L ′ is also in C. For a more precise definition of this notion we refer the reader to =-=[10]-=-. For our purposes it is sufficient that all important complexity classes like ptime, np, conp, and pspace have this property and that every such class contains ptime. Theorem 1. Let R be a class of r... |

39 |
On the Equivalence, Containment, and Covering Problems for the Regular and Context-Free
- Rosenkrantz, Szymanski
- 1976
(Show Context)
Citation Context ...le fragment we obtain is when each factor is restricted to a or a + . The complexities of equivalence, inclusion and intersection for general regular expressions and several fragments were studied in =-=[12, 13, 24]-=-. From these, the most related result is the conp-completeness of equivalence and inclusion of bounded languages [12]. A language L is bounded if there are strings v1,...,vn such that L ⊆ v ∗ 1 · · · ... |

35 | Caterpillars : A context specification technique
- Brüggemann-Klein, Wood
(Show Context)
Citation Context ...tical regular expressions. Another consequence of our results, independent of the one-unambiguous issue, is that optimization problems for navigational queries as expressed by caterpillar expressions =-=[5]-=-, X CPath reg and Xreg [16], or regular path queries [6] quickly turn intractable. Due to space limitations many proofs are omitted. We refer the interested reader to [15]. 2 Definitions Regular expre... |

33 | Typechecking top-down uniform unranked tree transducers
- Martens, Neven
- 2003
(Show Context)
Citation Context |

19 | Reasoning on regular path queries
- Calvanese, Giacomo, et al.
- 2003
(Show Context)
Citation Context ...sults, independent of the one-unambiguous issue, is that optimization problems for navigational queries as expressed by caterpillar expressions [5], X CPath reg and Xreg [16], or regular path queries =-=[6]-=- quickly turn intractable. Due to space limitations many proofs are omitted. We refer the interested reader to [15]. 2 Definitions Regular expressions. For the rest of the paper let Σ denote a finite ... |