Results 1 - 10
of
37
Adding nesting structure to words
- In Developments in Language Theory, LNCS 4036
, 2006
"... We propose the model of nested words for representation of data with both a linear ordering and a hierarchically nested matching of items. Examples of data with such dual linear-hierarchical structure include executions of structured programs, annotated linguistic data, and HTML/XML documents. Neste ..."
Abstract
-
Cited by 119 (17 self)
- Add to MetaCart
(Show Context)
We propose the model of nested words for representation of data with both a linear ordering and a hierarchically nested matching of items. Examples of data with such dual linear-hierarchical structure include executions of structured programs, annotated linguistic data, and HTML/XML documents. Nested words generalize both words and ordered trees, and allow both word and tree operations. We define nested word automata—finite-state acceptors for nested words, and show that the resulting class of regular languages of nested words has all the appealing theoretical properties that the classical regular word languages enjoys: deterministic nested word automata are as expressive as their nondeterministic counterparts; the class is closed under union, intersection, complementation, concatenation, Kleene-*, prefixes, and language homomorphisms; membership, emptiness, language inclusion, and language equivalence are all decidable; and definability in monadic second order logic corresponds exactly to finite-state recognizability. We also consider regular languages of infinite nested words and show that the closure properties, MSO-characterization, and decidability of decision problems carry over. The linear encodings of nested words give the class of visibly pushdown languages of words, and this class lies between balanced languages and deterministic context-free languages. We argue that for algorithmic verification of structured programs, instead of viewing the program as a context-free language over words, one should view it as a regular language of nested words (or equivalently, a visibly pushdown language), and this would allow model checking of many properties (such as stack inspection, pre-post conditions) that are not expressible in existing specification logics. We also study the relationship between ordered trees and nested words, and the corresponding automata: while the analysis complexity of nested word automata is the same as that of classical tree automata, they combine both bottom-up and top-down traversals, and enjoy expressiveness and succinctness benefits over tree automata. 1
First-order and temporal logics for nested words
- In LICS 2007
"... Nested words are a structured model of execution paths in procedural programs, reflecting their call and return nesting structure. Finite nested words also capture the structure of parse trees and other tree-structured data, such as XML. We provide new temporal logics for finite and infinite nested ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
(Show Context)
Nested words are a structured model of execution paths in procedural programs, reflecting their call and return nesting structure. Finite nested words also capture the structure of parse trees and other tree-structured data, such as XML. We provide new temporal logics for finite and infinite nested words, which are natural extensions of LTL, and prove that these logics are first-order expressivelycomplete. One of them is based on adding a ”within” modality, evaluating a formula on a subword, to a logic CaRet previously studied in the context of verifying properties of recursive state machines. The other logic is based on the notion of a summary path that combines the linear and nesting structures. For that logic, both model-checking and satisfiability are shown to be EXPTIME-complete. Finally, we prove that first-order logic over nested words has the three-variable property, and we present a temporal logic for nested words which is complete for the twovariable fragment of first-order. 1
Reasoning about XML with Temporal Logics and Automata
- In LPAR’08
"... We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
(Show Context)
We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics for trees. We choose a logic that admits a simple single-exponential translation into unranked tree automata, in the spirit of the classical LTL-to-Büchi automata translation. Automata arising from this translation have a number of additional properties; in particular, they are convenient for reasoning about unary node-selecting queries, which are important in the XML context. We give two applications of such reasoning: one deals with a classical XML problem of reasoning about navigation in the presence of schemas, and the other relates to verifying security properties of XML views.
Marrying words and trees
- PODS
, 2007
"... Traditionally, data that has both linear and hierarchical structure, such as annotated linguistic data, is modeled using ordered trees and queried using tree automata. In this paper, we argue that nested words and automata over nested words offer a better way to capture and process the dual structur ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Traditionally, data that has both linear and hierarchical structure, such as annotated linguistic data, is modeled using ordered trees and queried using tree automata. In this paper, we argue that nested words and automata over nested words offer a better way to capture and process the dual structure. Nested words generalize both words and ordered trees, and allow both word and tree operations. We study various classes of automata over nested words, and show that while they enjoy expressiveness and succinctness benefits over word and tree automata, their analysis complexity and closure properties are analogous to the corresponding word and tree special cases. In particular, we show that finite-state nested word automata can be exponentially more succinct than tree automata, and pushdown nested word automata include the two incomparable classes of context-free word languages and context-free tree languages.
SUCCINCTNESS OF THE COMPLEMENT AND INTERSECTION OF REGULAR EXPRESSIONS
, 2008
"... We study the succinctness of the complement and intersection of regular expressions. In particular, we show that when constructing a regular expression defining the complement of a given regular expression, a double exponential size increase cannot be avoided. Similarly, when constructing a regular ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
(Show Context)
We study the succinctness of the complement and intersection of regular expressions. In particular, we show that when constructing a regular expression defining the complement of a given regular expression, a double exponential size increase cannot be avoided. Similarly, when constructing a regular expression defining the intersection of a fixed and an arbitrary number of regular expressions, an exponential and double exponential size increase, respectively, can in worst-case not be avoided. All mentioned lower bounds improve the existing ones by one exponential and are tight in the sense that the target expression can be constructed in the corresponding time class, i.e., exponential or double exponential time. As a by-product, we generalize a theorem by Ehrenfeucht and Zeiger stating that there is a class of DFAs which are exponentially more succinct than regular expressions, to a fixed four-letter alphabet. When the given regular expressions are one-unambiguous, as for instance required by the XML Schema specification, the complement can be computed in polynomial time whereas the bounds concerning intersection continue to hold. For the subclass of single-occurrence regular expressions, we prove a tight exponential lower bound for intersection.
Forest Algebras
- LOGIC AND AUTOMATA (2007) 107-132
, 2007
"... There are at least as many interesting classes of regular tree languages as there are of regular word languages. However, much less is known about the former ones. In particular, very few decidable characterizations of tree language classes are known. For words, most known characterizations are obta ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
(Show Context)
There are at least as many interesting classes of regular tree languages as there are of regular word languages. However, much less is known about the former ones. In particular, very few decidable characterizations of tree language classes are known. For words, most known characterizations are obtained using algebra. With this in mind, the present paper proposes an algebraic framework for classifying regular languages of finite unranked labeled trees. If in a transformation semigroup we assume that the set being acted upon has a semigroup structure, then the transformation semigroup can be used to recognize languages of unranked trees. This observation allows us to examine the relationship connecting tree languages with standard algebraic concepts such as aperiodicity idempotency, or commutativity. The new algebraic setting is used to give several examples of decidable algebraic characterizations.
Complexity of decision problems for XML schemas and chain regular expressions
- Siam J. Comp
"... Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with “∗”, “+”, or “?”. Though of a very simple from, the usage of such expressions is widespread as eCHAREs, for instance, constitute a super class of the regular expressions most frequently used in practice in schema languages for XML. In particular, we show that all our lower and upper bounds for the inclusion and equivalence problem carry over to the corresponding decision problems for extended context-free grammars, and to single-type and restrained competition tree grammars. These grammars form abstractions of Document Type Definitions (DTDs), XML Schema definitions (XSDs) and the class of one-pass preorder typeable XML schemas, respectively. For the intersection problem, we show that obtained complexities only carry over to DTDs. In this respect, we also study two other classes of regular expressions related to XML: deterministic expressions and expressions where the number of occurrences of alphabet symbols is bounded by a constant. 1. Introduction. Although
Learning twig and path queries
- In ICDT
, 2012
"... We investigate the problem of learning XML queries, path queries and twig queries, from examples given by the user. A learning algorithm takes on the input a set of XML docu-ments with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the an-notation. ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
(Show Context)
We investigate the problem of learning XML queries, path queries and twig queries, from examples given by the user. A learning algorithm takes on the input a set of XML docu-ments with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the an-notation. We study two learning settings that differ with the types of annotations. In the first setting the user may only indicate required nodes that the query must select (i.e., positive examples). In the second, more general, setting, the user may also indicate forbidden nodes that the query must not select (i.e., negative examples). The query may or may not select any node with no annotation. We formalize what it means for a class of queries to be learn-able. One requirement is the existence of a learning algorithm that is sound i.e., always returning a query consistent with the examples given by the user. Furthermore, the learning algorithm should be complete i.e., able to produce every query with sufficiently rich examples. Other requirements in-volve tractability of the learning algorithm and its robustness to nonessential examples. We identify practical classes of Boolean and unary, path and twig queries that are learnable from positive examples. We also show that adding negative examples to the picture renders learning unfeasible.
Automata with nested pebbles capture first-order logic with transitive closure
- LOGICAL METHODS IN COMPUTER SCIENCE VOL. 3 (2:3) 2007, PP. 1–27
, 2007
"... ..."
Tree Pattern Rewriting Systems
"... Classical verification often uses abstraction when dealing with data. On the other hand, dynamic XML-based applications have become pervasive, for instance with the ever growing importance of web services. We define here Tree Pattern Rewriting Systems (TPRS) as an abstract model of dynamic XML-based ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Classical verification often uses abstraction when dealing with data. On the other hand, dynamic XML-based applications have become pervasive, for instance with the ever growing importance of web services. We define here Tree Pattern Rewriting Systems (TPRS) as an abstract model of dynamic XML-based documents. TPRS systems generate infinite transition systems, where states are unranked and unordered trees (hence possibly modeling XML documents). Their guarded transition rules are described by means of tree patterns. Our main result is that given a TPRS system (T, R), a tree pattern P and some integer k such that any reachable document from T has depth at most k, it is decidable (albeit of non elementary complexity) whether some tree matching P is reachable from T.