Results 1  10
of
62
Incremental Maintenance of Views with Duplicates
"... We study the problem of efficient maintenance of materialized views that may contain duplicates. This problem is particularly important when queries against such views involve aggregate functions, which need duplicates to produce correct results. Unlike most work on the view maintenance problem that ..."
Abstract

Cited by 188 (11 self)
 Add to MetaCart
We study the problem of efficient maintenance of materialized views that may contain duplicates. This problem is particularly important when queries against such views involve aggregate functions, which need duplicates to produce correct results. Unlike most work on the view maintenance problem that is based on an algorithmic approach, our approach is algebraic and based on equational reasoning. This approach has a number of advantages: it is robust and easily extendible to new language constructs, it produces output that can be used by query optimizers, and it simpli es correctness proofs. We use a natural extension of the relational algebra operations to bags (multisets) as our basic language. We present an algorithm that propagates changes from base relations to materialized views. This algorithm is based on reasoning about equivalence of bagvalued expressions. We prove that it is correct and preserves a certain notion of minimality that ensures that no unnecessary tuples are computed. Although it is generally only a heuristic that computing changes to the view rather than recomputing the view from scratch is more efficient, we prove results saying that under normal circumstances one should expect the change propagation algorithm to be significantly faster and more space efficient than complete recomputing of the view. We also show that our approach interacts nicely with aggregate functions, allowing their correct evaluation on views that change.
Query Languages for Bags and Aggregate Functions
 Journal of Computer and System Sciences
, 1997
"... Theoretical foundations for querying databases based on bags are studied in this paper. We fully determine the strength of many polynomialtime bag operators relative to an ambient query language. Then we obtain BQL, a query language for bags, by picking the strongest combination of these operators. ..."
Abstract

Cited by 60 (34 self)
 Add to MetaCart
(Show Context)
Theoretical foundations for querying databases based on bags are studied in this paper. We fully determine the strength of many polynomialtime bag operators relative to an ambient query language. Then we obtain BQL, a query language for bags, by picking the strongest combination of these operators. The relationship between the nested relational algebra and various fragments of BQL is investigated. The precise amount of extra power that BQL possesses over the nested relational algebra is determined. It is shown that the additional expressiveness of BQL amounts to adding aggregate functions to a relational language. The expressive power of BQL and related languages is investigated in depth. We prove that these languages possess the conservative extension property. That is, the expressibility of queries in these languages is independent of the nesting height of intermediate data. Using this result, we show that recursive queries, such as transitive closure, are not definable in BQL. A ne...
On the Complexity of Nonrecursive XQuery and Functional Query Languages on Complex Values
 In Proc. PODS’05
"... This article studies the complexity of evaluating functional query languages for complex values such as monad algebra and the recursionfree fragment of XQuery. We show that monad algebra with equality restricted to atomic values is complete for the class TA[2O(n) , O(n)] of problems solvable in lin ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
(Show Context)
This article studies the complexity of evaluating functional query languages for complex values such as monad algebra and the recursionfree fragment of XQuery. We show that monad algebra with equality restricted to atomic values is complete for the class TA[2O(n) , O(n)] of problems solvable in linear exponential time with a linear number of alternations. The monotone fragment of monad algebra with atomic value equality but without negation is complete for nondeterministic exponential time. For monad algebra with deep equality, we establish TA[2O(n) , O(n)] lower and exponentialspace upper bounds. We also study a fragment of XQuery, Core XQuery, that seems to incorporate all the features of a query language on complex values that are traditionally deemed essential. A close connection between monad algebra on lists and Core XQuery (with “child ” as the only axis) is exhibited, and it is shown that these languages are expressively equivalent up to representation issues. We show that Core XQuery is just as hard as monad algebra w.r.t. query and combined complexity, and that it is in TC0 if the query is assumed fixed. As Core XQuery is NEXPTIMEhard, it is commonly believed that any algorithm for evaluating Core XQuery has to require exponential amounts of working memory and doubly exponential time in the worst case. We present a property of queries – the lack of a certain form of composition – that virtually all realworld XQueries have and that allows for query evaluation in singly exponential time and polynomial space. Still, we are able to show for an important special case – Core XQuery with equality testing restricted to atomic values – that the compositionfree language is just as expressive as the language with composition. Thus, under widelyheld complexitytheoretic assumptions, the compositionfree language is an exponentially less succinct version of the language with composition.
Some Properties of Query Languages for Bags
 IN PROCEEDINGS OF 4TH INTERNATIONAL WORKSHOP ON DATABASE PROGRAMMING LANGUAGES
, 1993
"... In this paper we study the expressive power of query languages for nested bags. We define the ambient bag language by generalizing the constructs of the relational language of BreazuTannen, Buneman and Wong, which is known to have precisely the power of the nested relational algebra. Relative s ..."
Abstract

Cited by 48 (33 self)
 Add to MetaCart
(Show Context)
In this paper we study the expressive power of query languages for nested bags. We define the ambient bag language by generalizing the constructs of the relational language of BreazuTannen, Buneman and Wong, which is known to have precisely the power of the nested relational algebra. Relative strength of additional polynomial constructs is studied, and the ambient language endowed with the strongest combination of those constructs is chosen as a candidate for the basic bag language, which is called BQL (Bag Query Language). We prove that achieveing the power of BQL in the relational language amounts to adding simple arithmetic to the latter. We show that BQL has shortcomings of the relational algebra: it can not express recursive queries. In particular, parity test is not definable in BQL. We consider augmenting BQL with powerbag and structural recursion to overcome this deficiency. In contrast to the relational case, where powerset and structural recursion are equivalent...
Top–Down Induction of Decision Trees Classifiers–A survey
 Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
, 2005
"... Abstract—Decision trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. This pape ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Decision trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. This paper presents an updated survey of current methods for constructing decision tree classifiers in a topdown manner. The paper suggests a unified algorithmic framework for presenting these algorithms and describes the various splitting criteria and pruning methodologies. Index Terms—Classification, decision trees, pruning methods, splitting criteria. I.
New Techniques for Studying Set Languages, Bag Languages and Aggregate Functions
, 1994
"... We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphs and prove that the properties of thes ..."
Abstract

Cited by 44 (28 self)
 Add to MetaCart
We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphs and prove that the properties of these graphs that can be tested in such languages are either finite or cofinite. This result settles the conjectures of Grumbach, Milo, and Paredaens that parity test, transitive closure, and balanced binary tree test are not expressible in bag languages like the PTIME fragment of BALG of Grumbach and Milo and BQL of Libkin and Wong. Moreover, it implies that many recursive queries, including simple ones like the test for a chain, cannot be expressed in a nested relational language even when aggregate functions are available. In an attempt to generalize the finitecofiniteness result, we study the bounded degree property which says that the number of distinct in and outdegrees in the output of...
Local Properties of Query Languages
"... predeterminedportionoftheinput.Examplesincludeallrelationalcalculusqueries. everyrelationalcalculus(rstorder)queryislocal,thegeneralresultsprovedforlocalqueriescan manyeasyinexpressibilityproofsforlocalqueries.Wethenconsideracloselyrelatedproperty, namely,theboundeddegreeproperty.Itdescribestheoutp ..."
Abstract

Cited by 33 (23 self)
 Add to MetaCart
(Show Context)
predeterminedportionoftheinput.Examplesincludeallrelationalcalculusqueries. everyrelationalcalculus(rstorder)queryislocal,thegeneralresultsprovedforlocalqueriescan manyeasyinexpressibilityproofsforlocalqueries.Wethenconsideracloselyrelatedproperty, namely,theboundeddegreeproperty.Itdescribestheoutputsoflocalqueriesonstructuresthat locallylook\simple."Everyquerythatislocalisshowntohavetheboundeddegreeproperty.Since Westartbyprovingageneralresultdescribingoutputsoflocalqueries.Thisresultleadsto toapplythanEhrenfeuchtFrassegames.Wealsoshowthatsomegeneralizationsofthebounded degreepropertythatwereconjecturedtohold,failforrelationalcalculus. beviewedas\otheshelf"strategiesforprovinginexpressibilityresults,whichareofteneasier maintenanceofviews,andshowthatSQLandrelationalcalculusareincapableofmaintainingthe gregates,whichisessentiallyplainSQL,hastheboundeddegreeproperty,thusansweringaques tionthathasbeenopenforseveralyears.Consequently,rstorderquerieswithHartigorRescher quantiersalsohavetheboundeddegreeproperty.Finally,weapplyourresultstoincremental Wethenprovethatthelanguageobtainedfromrelationalcalculusbyaddinggroupingandag
Sequences, Datalog and Transducers
, 1996
"... This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on a new notion called the extended active domain of a database. The extended domain contains all the sequences in the database and all their subsequences. This idea leads to a clear distinction between safe and unsafe recursion over sequences: safe recursion stays inside the extended active domain, while unsafe recursion does not. By carefully limiting the amountof unsafe recursion, the paper develops a safe and expressive subset of Sequence Datalog. As part of the development, a new type of transducer is introduced, called a generalized sequence transducer. Unsafe recursion is allowed only within these generalized transducers. Generalized transducers extend ordinary transducers by allowing them to invoke other transducers as "subroutines." Generalized transducers can be implemented in Sequence Datalog in a straightforward way. Moreover, their introduction into the language leads to simple conditions that guarantee safety and finiteness. This paper develops two such conditions. The first condition expresses exactly the class of ptime sequence functions; and the second expresses exactly the class of elementary sequence functions.
An Algebra for Pomsets
, 1995
"... We study languages for manipulating partially ordered structures with duplicates (e.g. trees, lists). As a general framework, we consider the pomset (partially ordered multiset) data type. We introduce an algebra for pomsets, which generalizes traditional algebras for (nested) sets, bags and list ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
We study languages for manipulating partially ordered structures with duplicates (e.g. trees, lists). As a general framework, we consider the pomset (partially ordered multiset) data type. We introduce an algebra for pomsets, which generalizes traditional algebras for (nested) sets, bags and lists. This paper is motivated by the study of the impact of different language primitives on the expressive power. We show that the use of partially ordered types increases the expressive power significantly. Surprisingly, it turns out that the algebra when restricted to both unordered (bags) and totally ordered (lists) intermediate types, yields the same expressive power as fixpoint logic with counting on relational databases. It therefore constitutes a rather robust class of relational queries. On the other hand, we obtain a characterization of PTIME queries on lists by considering only totally ordered types.