Results 1 - 10
of
89
Monadic Datalog and the Expressive Power of Languages for Web Information Extraction
- J. ACM
, 2002
"... Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, w ..."
Abstract
-
Cited by 64 (10 self)
- Add to MetaCart
Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, we first study monadic datalog as a wrapping language (over ranked or unranked tree structures). Using previous work by Neven and Schwentick, we show that this simple language is equivalent to full monadic second order logic (MSO) in its ability to specify wrappers. We believe that MSO has the right expressiveness required for Web information extraction and thus propose MSO as a yardstick for evaluating and comparing wrappers. Using the above result, we study the kernel fragment Elog- of the Elog wrapping language used in the Lixto system (a visual wrapper generator). The striking fact here is that Elog- exactly captures MSO, yet is easier to use. Indeed, programs in this language can be entirely visually specified. We also formally compare Elog to other wrapping languages proposed in the literature.
The Regular Viewpoint on PA-Processes
- Theoretical Computer Science
, 1999
"... PA is the process algebra allowing non-determinism, sequential and parallel compositions, and recursion. We suggest viewing PA-processes as trees, and using tree-automata techniques for verification problems on PA. Our main result is that the set of iterated predecessors of a regular set of PA-proce ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
PA is the process algebra allowing non-determinism, sequential and parallel compositions, and recursion. We suggest viewing PA-processes as trees, and using tree-automata techniques for verification problems on PA. Our main result is that the set of iterated predecessors of a regular set of PA-processes is a regular tree language, and similarly for iterated successors. Furthermore, the corresponding tree-automata can be built effectively in polynomial-time. This has many immediate applications to verification problems for PA-processes, among which a simple and general model-checking algorithm.
Tree-Walking Pebble Automata
- Jewels are forever, contributions to Theoretical Computer Science in honor of Arto Salomaa
, 1999
"... this paper is to investigate the power of tree-walking automata with pebbles. Obviously, the unrestricted use of pebbles leads to a class of tree languages much larger than the regular tree languages, in fact to all tree languages in NSPACE(logn). Thus, we restrict the automaton to the recursive use ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
this paper is to investigate the power of tree-walking automata with pebbles. Obviously, the unrestricted use of pebbles leads to a class of tree languages much larger than the regular tree languages, in fact to all tree languages in NSPACE(logn). Thus, we restrict the automaton to the recursive use of pebbles, in the sense that the life times of pebbles, i.e., the times between dropping a pebble and lifting it again, are properly nested. A similar, but stronger, nesting requirement is studied in [13] for 2-way automata on strings. We prove in Section 5 that our restriction indeed guarantees that all tree languages recognized by the tree-walking pebble automaton are regular, but we conjecture that the automaton is not powerful enough to recognize all regular tree languages. In Section 6 we generalize the notion of pebble to that of a \set-pebble", in such a way that the tree-walking set-pebble automaton recognizes exactly the regular tree languages.
Extensions of Attribute Grammars for Structured Document Queries
, 1999
"... Document specification languages like for instance XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of a ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
Document specification languages like for instance XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather than over standard context-free grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document.
Caterpillars: A Context Specification Technique
- Markup Languages
, 2000
"... We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are primarily applying this technique in the specification of context-dependent style sheets for HTML, SGML and XML documents, it can also be used f ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are primarily applying this technique in the specification of context-dependent style sheets for HTML, SGML and XML documents, it can also be used for query specification for structured documents, as we shall demonstrate, and for the specification of computer program transformations. From a conceptual point of view, structured documents are trees, and one of the oldest and best-established techniques to process trees and, hence, structured documents are tree automata. We present a number of theoretical results that allow us to compare the expressive power of caterpillar expressions and caterpillar automata, their companions, to the expressive power of tree automata. In particular, we demonstrate that each caterpillar expression describes a regular tree language that is, hence, recognizable by a tree automaton. Finally, we empl...
Frontiers of tractability for typechecking simple XML transformations
- PODS
, 2004
"... Typechecking consists of statically verifying whether the output of an XML transformation is always conform to an output type for documents satisfying a given input type. We focus on complete algorithms which always produce the correct answer. We consider top-down XML transformations incorporating X ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Typechecking consists of statically verifying whether the output of an XML transformation is always conform to an output type for documents satisfying a given input type. We focus on complete algorithms which always produce the correct answer. We consider top-down XML transformations incorporating XPath expressions and abstract document types by grammars and tree automata. By restricting schema languages and transformations, we identify several practical settings for which typechecking is in polynomial time. Moreover, the resulting framework provides a rather complete picture as we show that most scenarios can not be enlarged without rendering the typechecking problem intractable. So, the present research sheds light on when to use fast complete algorithms and when to reside to sound but incomplete ones.
On the power of tree-walking automata
- INFORMATION AND COMPUTATION
, 2000
"... Tree-walking automata (TWAs) recently received new attention in the fields of formal languages and databases. Towards a better understanding of their expressiveness, we characterize them in terms of transitive closure logic formulas in normal form. It is conjectured by Engelfriet and Hoogeboom that ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Tree-walking automata (TWAs) recently received new attention in the fields of formal languages and databases. Towards a better understanding of their expressiveness, we characterize them in terms of transitive closure logic formulas in normal form. It is conjectured by Engelfriet and Hoogeboom that TWAs cannot de ne all regular tree languages, or equivalently, all of monadic second-order logic. We proof this conjecture for a restricted, but powerful, class of TWAs. In particular, we show that 1-bounded TWAs, that is TWAs that are only allowed to traverse every edge of the input tree at most once in every direction, cannot de ne all regular languages. We then extend this result to a class of TWAs that can simulate first-order logic (FO) and is capable of expressing properties not definable in FO extended with regular path expressions; the latter logic being avalid abstraction of current query languages for XML and semi-structured data.
Bottom-up and Top-down Tree Series Transformations
- J. Autom. Lang. Combin
, 2000
"... We generalize bottom-up tree transducers and top-down tree transducers to the concept of bottom-up tree series transducer and top-down tree series transducer, respectively, by allowing formal tree series as output rather than trees, where a formal tree series is a mapping from output trees to some s ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
We generalize bottom-up tree transducers and top-down tree transducers to the concept of bottom-up tree series transducer and top-down tree series transducer, respectively, by allowing formal tree series as output rather than trees, where a formal tree series is a mapping from output trees to some semiring. We associate two semantics with a tree series transducer: a mapping which transforms trees into tree series (for short: tree to tree series transformation or t-ts transformation), and a mapping which transforms tree series into tree series (for short: tree series transformation or ts-ts transformation). We show that the standard case of tree transducers is reobtained by choosing the boolean semiring under the t-ts semantics. Also, for each of the two types of tree series transducers and for both types of semantics, we prove a characterization which generalizes in a straightforward way the corresponding characterization result for the underlying tree transducer class. Mo...
Typechecking Top-Down Uniform Unranked Tree Transducers
- in 9th International Conference on Database Theory, ser. LNCS
"... We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for query languages capable of testing equality of data values, we retur ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for query languages capable of testing equality of data values, we return to the limited framework where we abstract XML documents as labeled ordered trees. We focus on simple top-down recursive transformations motivated by XSLT and structural recursion on trees. We parameterize the problem by several restrictions on the transformations (deleting, non-deleting, bounded width) and consider both tree automata and DTDs as output schemas. The complexity of the typechecking problems in this scenario range from ptime to exptime.

