Results 1  10
of
108
Monadic Datalog and the Expressive Power of Languages for Web Information Extraction
 J. ACM
, 2002
"... Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, w ..."
Abstract

Cited by 75 (11 self)
 Add to MetaCart
Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, we first study monadic datalog as a wrapping language (over ranked or unranked tree structures). Using previous work by Neven and Schwentick, we show that this simple language is equivalent to full monadic second order logic (MSO) in its ability to specify wrappers. We believe that MSO has the right expressiveness required for Web information extraction and thus propose MSO as a yardstick for evaluating and comparing wrappers. Using the above result, we study the kernel fragment Elog of the Elog wrapping language used in the Lixto system (a visual wrapper generator). The striking fact here is that Elog exactly captures MSO, yet is easier to use. Indeed, programs in this language can be entirely visually specified. We also formally compare Elog to other wrapping languages proposed in the literature.
The Regular Viewpoint on PAProcesses
 Theoretical Computer Science
, 1999
"... PA is the process algebra allowing nondeterminism, sequential and parallel compositions, and recursion. We suggest viewing PAprocesses as trees, and using treeautomata techniques for verification problems on PA. Our main result is that the set of iterated predecessors of a regular set of PAproce ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
PA is the process algebra allowing nondeterminism, sequential and parallel compositions, and recursion. We suggest viewing PAprocesses as trees, and using treeautomata techniques for verification problems on PA. Our main result is that the set of iterated predecessors of a regular set of PAprocesses is a regular tree language, and similarly for iterated successors. Furthermore, the corresponding treeautomata can be built effectively in polynomialtime. This has many immediate applications to verification problems for PAprocesses, among which a simple and general modelchecking algorithm.
TreeWalking Pebble Automata
 Jewels are forever, contributions to Theoretical Computer Science in honor of Arto Salomaa
, 1999
"... this paper is to investigate the power of treewalking automata with pebbles. Obviously, the unrestricted use of pebbles leads to a class of tree languages much larger than the regular tree languages, in fact to all tree languages in NSPACE(logn). Thus, we restrict the automaton to the recursive use ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
this paper is to investigate the power of treewalking automata with pebbles. Obviously, the unrestricted use of pebbles leads to a class of tree languages much larger than the regular tree languages, in fact to all tree languages in NSPACE(logn). Thus, we restrict the automaton to the recursive use of pebbles, in the sense that the life times of pebbles, i.e., the times between dropping a pebble and lifting it again, are properly nested. A similar, but stronger, nesting requirement is studied in [13] for 2way automata on strings. We prove in Section 5 that our restriction indeed guarantees that all tree languages recognized by the treewalking pebble automaton are regular, but we conjecture that the automaton is not powerful enough to recognize all regular tree languages. In Section 6 we generalize the notion of pebble to that of a \setpebble", in such a way that the treewalking setpebble automaton recognizes exactly the regular tree languages.
Bottomup and Topdown Tree Series Transformations
 J. Autom. Lang. Combin
, 2000
"... We generalize bottomup tree transducers and topdown tree transducers to the concept of bottomup tree series transducer and topdown tree series transducer, respectively, by allowing formal tree series as output rather than trees, where a formal tree series is a mapping from output trees to some s ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
We generalize bottomup tree transducers and topdown tree transducers to the concept of bottomup tree series transducer and topdown tree series transducer, respectively, by allowing formal tree series as output rather than trees, where a formal tree series is a mapping from output trees to some semiring. We associate two semantics with a tree series transducer: a mapping which transforms trees into tree series (for short: tree to tree series transformation or tts transformation), and a mapping which transforms tree series into tree series (for short: tree series transformation or tsts transformation). We show that the standard case of tree transducers is reobtained by choosing the boolean semiring under the tts semantics. Also, for each of the two types of tree series transducers and for both types of semantics, we prove a characterization which generalizes in a straightforward way the corresponding characterization result for the underlying tree transducer class. Mo...
Caterpillars: A Context Specification Technique
 Markup Languages
, 2000
"... We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are primarily applying this technique in the specification of contextdependent style sheets for HTML, SGML and XML documents, it can also be used f ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are primarily applying this technique in the specification of contextdependent style sheets for HTML, SGML and XML documents, it can also be used for query specification for structured documents, as we shall demonstrate, and for the specification of computer program transformations. From a conceptual point of view, structured documents are trees, and one of the oldest and bestestablished techniques to process trees and, hence, structured documents are tree automata. We present a number of theoretical results that allow us to compare the expressive power of caterpillar expressions and caterpillar automata, their companions, to the expressive power of tree automata. In particular, we demonstrate that each caterpillar expression describes a regular tree language that is, hence, recognizable by a tree automaton. Finally, we empl...
Extensions of Attribute Grammars for Structured Document Queries
, 1999
"... Document specification languages like for instance XML, model documents using extended contextfree grammars. These differ from standard contextfree grammars in that they allow arbitrary regular expressions on the righthand side of productions. To query such documents, we introduce a new form of a ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
Document specification languages like for instance XML, model documents using extended contextfree grammars. These differ from standard contextfree grammars in that they allow arbitrary regular expressions on the righthand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended contextfree grammars rather than over standard contextfree grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document.
Frontiers of tractability for typechecking simple XML transformations
 PODS
, 2004
"... Typechecking consists of statically verifying whether the output of an XML transformation is always conform to an output type for documents satisfying a given input type. We focus on complete algorithms which always produce the correct answer. We consider topdown XML transformations incorporating X ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
Typechecking consists of statically verifying whether the output of an XML transformation is always conform to an output type for documents satisfying a given input type. We focus on complete algorithms which always produce the correct answer. We consider topdown XML transformations incorporating XPath expressions and abstract document types by grammars and tree automata. By restricting schema languages and transformations, we identify several practical settings for which typechecking is in polynomial time. Moreover, the resulting framework provides a rather complete picture as we show that most scenarios can not be enlarged without rendering the typechecking problem intractable. So, the present research sheds light on when to use fast complete algorithms and when to reside to sound but incomplete ones.
Typechecking TopDown Uniform Unranked Tree Transducers
 in 9th International Conference on Database Theory, ser. LNCS
"... We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for query languages capable of testing equality of data values, we retur ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
We investigate the typechecking problem for XML queries: statically verifying that every answer to a query conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for query languages capable of testing equality of data values, we return to the limited framework where we abstract XML documents as labeled ordered trees. We focus on simple topdown recursive transformations motivated by XSLT and structural recursion on trees. We parameterize the problem by several restrictions on the transformations (deleting, nondeleting, bounded width) and consider both tree automata and DTDs as output schemas. The complexity of the typechecking problems in this scenario range from ptime to exptime.
On the power of treewalking automata
 INFORMATION AND COMPUTATION
, 2000
"... Treewalking automata (TWAs) recently received new attention in the fields of formal languages and databases. Towards a better understanding of their expressiveness, we characterize them in terms of transitive closure logic formulas in normal form. It is conjectured by Engelfriet and Hoogeboom that ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Treewalking automata (TWAs) recently received new attention in the fields of formal languages and databases. Towards a better understanding of their expressiveness, we characterize them in terms of transitive closure logic formulas in normal form. It is conjectured by Engelfriet and Hoogeboom that TWAs cannot de ne all regular tree languages, or equivalently, all of monadic secondorder logic. We proof this conjecture for a restricted, but powerful, class of TWAs. In particular, we show that 1bounded TWAs, that is TWAs that are only allowed to traverse every edge of the input tree at most once in every direction, cannot de ne all regular languages. We then extend this result to a class of TWAs that can simulate firstorder logic (FO) and is capable of expressing properties not definable in FO extended with regular path expressions; the latter logic being avalid abstraction of current query languages for XML and semistructured data.