Results 1  10
of
71
Monadic Datalog and the Expressive Power of Languages for Web Information Extraction
 J. ACM
, 2002
"... Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, w ..."
Abstract

Cited by 75 (11 self)
 Add to MetaCart
Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, we first study monadic datalog as a wrapping language (over ranked or unranked tree structures). Using previous work by Neven and Schwentick, we show that this simple language is equivalent to full monadic second order logic (MSO) in its ability to specify wrappers. We believe that MSO has the right expressiveness required for Web information extraction and thus propose MSO as a yardstick for evaluating and comparing wrappers. Using the above result, we study the kernel fragment Elog of the Elog wrapping language used in the Lixto system (a visual wrapper generator). The striking fact here is that Elog exactly captures MSO, yet is easier to use. Indeed, programs in this language can be entirely visually specified. We also formally compare Elog to other wrapping languages proposed in the literature.
Monadic Queries over TreeStructured Data
, 2002
"... Monadic query languages over trees currently receive considerable interest in the database community, as the problem of selecting nodes from a tree is the most basic and widespread database query problem in the context of XML. Partly a survey of recent work done by the authors and their group on log ..."
Abstract

Cited by 70 (9 self)
 Add to MetaCart
Monadic query languages over trees currently receive considerable interest in the database community, as the problem of selecting nodes from a tree is the most basic and widespread database query problem in the context of XML. Partly a survey of recent work done by the authors and their group on logical query languages for this problem and their expressiveness, this paper provides a number of new results related to the complexity of such languages over socalled axis relations (such as "child" or "descendant") which are motivated by their presence in the XPath standard or by their utility for data extraction (wrapping).
Conjunctive Queries over Trees
, 2004
"... We study the complexity and expressive power of conjunctive queries over unranked labeled trees, where the tree structures are represented using "axis relations" such as "child", "descendant", and "following" (we consider a superset of the XPath axes) as well as unary relations for node labels. (Cyc ..."
Abstract

Cited by 66 (7 self)
 Add to MetaCart
We study the complexity and expressive power of conjunctive queries over unranked labeled trees, where the tree structures are represented using "axis relations" such as "child", "descendant", and "following" (we consider a superset of the XPath axes) as well as unary relations for node labels. (Cyclic) conjunctive queries over trees occur in a wide range of data management scenarios related to XML, the Web, and computational linguistics. We establish a framework for characterizing structures representing trees for which conjunctive queries can be evaluated e# ciently. Then we completely chart the tractability frontier of the problem for our axis relations, i.e., we find all subsetmaximal sets of axes for which query evaluation is in polynomial time. All polynomialtime results are obtained immediately using the proof techniques from our framework. Finally, we study the expressiveness of conjunctive queries over trees and compare it to the expressive power of fragments of XPath. We show that for each conjunctive query, there is an equivalent acyclic positive query (i.e., a set of acyclic conjunctive queries), but that in general this query is not of polynomial size.
The Lixto Data Extraction Project  Back and Forth between Theory and Practice
 PODS 2004
, 2004
"... We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definition software. We discuss the project's main motivations and ideas, in particular the use of a logicbased framework for w ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definition software. We discuss the project's main motivations and ideas, in particular the use of a logicbased framework for wrapping. Then we present theoretical results on monadic datalog over trees and on Elog, its close relative which is used as the internal wrapper language in the Lixto system. These results include both a characterization of the expressive power and the complexity of these languages. We describe the visual wrapper specification process in Lixto and various practical aspects of wrapping. We discuss work on the complexity of query languages for trees that was inseminated by our theoretical study of logicbased languages for wrapping. Then we return to the practice of wrapping and the Lixto Transformation Server, which allows for streaming integration of data extracted from Web pages. This is a natural requirement in complex services based on Web wrapping. Finally, we discuss industrial applications of Lixto and point to open problems for future study.
A completeness result for reasoning with incomplete firstorder knowledge bases
 In Proc. ofKR98
, 1998
"... In previous work, Levesque proposed an extension to classical databases that would allow for a certain form of incomplete firstorder knowledge. Since this extension was sufficient to make full logical deduction undecidable, he also proposed an alternative reasoning scheme with desirable logical pro ..."
Abstract

Cited by 30 (9 self)
 Add to MetaCart
In previous work, Levesque proposed an extension to classical databases that would allow for a certain form of incomplete firstorder knowledge. Since this extension was sufficient to make full logical deduction undecidable, he also proposed an alternative reasoning scheme with desirable logical properties. He also claimed (without proof) that this reasoning could be implemented efficiently using database techniques such as projections and joins. In this paper, we substantiate this claim and show how to adapt a bottomup database query evaluation algorithm for this purpose, thus obtaining a tractability result comparable to those that exist for databases. The rest of the paper is organized as follows. In the next section, we review proper KBs and V, prove a new property of V, i.e. locality, and define answers to open queries. In Section 3, we review the complexity of database query evaluation, and present a polynomial time algorithm for evaluating Kguarded formulas. In Section 4, we show how to use this algorithm to evaluate queries wrt proper KBs and hence obtain a tractability result. In Section 5, we illustrate this query evaluation method for proper KBs with some example queries. Finally in Section 6, we describe some future work. 1
Querying the web reconsidered: Design principles for versatile web query languages
 Journal of Semantic Web and Information Systems
, 2005
"... A decade of experience with research proposals as well as standardized query languages for the conventional Web and the recent emergence of query languages for the Semantic Web call for a reconsideration of design principles for Web and Semantic Web query languages. This article first argues that a ..."
Abstract

Cited by 30 (18 self)
 Add to MetaCart
A decade of experience with research proposals as well as standardized query languages for the conventional Web and the recent emergence of query languages for the Semantic Web call for a reconsideration of design principles for Web and Semantic Web query languages. This article first argues that a new generation of versatile Web query languages is needed for solving the challenges posed by the changing Web: We call versatile those query languages able to cope with both Web and Semantic Web data expressed in any (Web or Semantic Web) markup language. This article further suggests that (wellknown) referential transparency and (novel) answerclosedness are essential features of versatile query languages. Indeed, they allow queries to be considered like forms and answers like formfillings in the spirit of the “querybyexample ” paradigm. This article finally suggests that the decentralized and heterogeneous nature of the Web requires incomplete data specifications (or “incomplete queries”) and incomplete data selections (or “incomplete answers”): the formlike query can be specified without precise knowledge of the queried data and answers can be restricted to contain only an excerpt of the queried data. 1.
Fixedparameter tractability, definability, and model checking
 SIAM Journal on Computing
, 2001
"... In this article, we study parameterized complexity theory from the perspective of logic, or more specifically, descriptive complexity theory. We propose to consider parameterized modelchecking problems for various fragments of firstorder logic as generic parameterized problems and show how this ap ..."
Abstract

Cited by 29 (11 self)
 Add to MetaCart
In this article, we study parameterized complexity theory from the perspective of logic, or more specifically, descriptive complexity theory. We propose to consider parameterized modelchecking problems for various fragments of firstorder logic as generic parameterized problems and show how this approach can be useful in studying both fixedparameter tractability and intractability. For example, we establish the equivalence between the modelchecking for existential firstorder logic, the homomorphism problem for relational structures, and the substructure isomorphism problem. Our main tractability result shows that modelchecking for firstorder formulas is fixedparameter tractable when restricted to a class of input structures with an excluded minor. On the intractability side, for everyØ�we prove an equivalence between modelchecking for firstorder formulas withØquantifier alternations and the parameterized halting problem for alternating Turing machines withØalternations. We discuss the close connection between this alternation hierarchy and Downey and Fellows ’ Whierarchy. On a more abstract level, we consider two forms of definability, called Fagin definability and slicewise definability, that are appropriate for describing parameterized problems. We give a characterization of the class FPT of all fixedparameter tractable problems in terms of slicewise definability in finite variable least fixedpoint logic, which is reminiscent of the ImmermanVardi Theorem characterizing the class PTIME in terms of definability in least fixedpoint logic. 1
Hypertree decompositions: A survey
 In: MFCS ’01: Proceedings of the 26th International Symposium on Mathematical Foundations of Computer Science
, 2001
"... Abstract. This paper surveys recent results related to the concept of hypertree decomposition and the associated notion of hypertree width. A hypertree decomposition of a hypergraph (similar to a tree decomposition of a graph) is a suitable clustering of its hyperedges yielding a tree or a forest. I ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
Abstract. This paper surveys recent results related to the concept of hypertree decomposition and the associated notion of hypertree width. A hypertree decomposition of a hypergraph (similar to a tree decomposition of a graph) is a suitable clustering of its hyperedges yielding a tree or a forest. Important NP hard problems become tractable if restricted to instances whose associated hypergraphs are of bounded hypertree width. We also review a number of complexity results on problems whose structure is described by acyclic or nearly acyclic hypergraphs. 1
Computing crossing numbers in quadratic time
 J. Comput. Syst. Sci
, 2004
"... We show that for every fixed k ≥ 0 there is a quadratic time algorithm that decides whether a given graph has crossing number at most k and, if this is the case, computes a drawing of the graph in the plane with at most k crossings. 1. ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
We show that for every fixed k ≥ 0 there is a quadratic time algorithm that decides whether a given graph has crossing number at most k and, if this is the case, computes a drawing of the graph in the plane with at most k crossings. 1.