Results 1 - 10
of
56
Incremental Maintenance of Views with Duplicates
"... Westudytheproblemofecientmaintenanceofmaterializedviews thatmaycontainduplicates.Thisproblemis particularlyimportantwhenqueriesagainstsuchviewsinvolve aggregatefunctions,whichneedduplicatestoproduce correctresults.Unlikemostworkontheviewmaintenance problemthatisbasedonanalgorithmicapproach,ourapproa ..."
Abstract
-
Cited by 186 (9 self)
- Add to MetaCart
Westudytheproblemofecientmaintenanceofmaterializedviews thatmaycontainduplicates.Thisproblemis particularlyimportantwhenqueriesagainstsuchviewsinvolve aggregatefunctions,whichneedduplicatestoproduce correctresults.Unlikemostworkontheviewmaintenance problemthatisbasedonanalgorithmicapproach,ourapproachis algebraicandbasedonequationalreasoning.This approachhasanumberofadvantages:itisrobustandeasily extendibletonewlanguageconstructs,itproducesoutput thatcanbeusedbyqueryoptimizers,anditsimpliescorrectness proofs. Weuseanaturalextensionoftherelationalalgebra operationstobags(multisets)asourbasiclanguage.We presentanalgorithmthatpropagateschangesfrombase relationstomaterializedviews.Thisalgorithmisbased onreasoningaboutequivalenceofbag-valuedexpressions. Weprovethatitiscorrectandpreservesacertainnotion ofminimalitythatensuresthatnounnecessarytuplesare computed.Althoughitisgenerallyonlyaheuristicthat computingchangestotheviewratherthanrecomputing theviewfromscratchismoreecient,weproveresults sayingthatundernormalcircumstancesoneshouldexpect thechangepropagationalgorithmtobesignicantlyfaster andmorespaceecientthancompleterecomputingofthe view.Wealsoshowthatourapproachinteractsnicely withaggregatefunctions,allowingtheircorrectevaluation onviewsthatchange.
A Query Language for Multidimensional Arrays: Design, Implementation, and Optimization Techniques
, 1996
"... While much recent research has focussed on extending databases beyond thetraditionalrelationalmodel, relatively littlehasbeendonetodevelopdatabasetoolsforquerying dataorganizedin(multidimensional)arrays.Thescientic computingcommunityhasmadelittleuseofavailable databasetechnology.Instead,multidimensi ..."
Abstract
-
Cited by 72 (2 self)
- Add to MetaCart
While much recent research has focussed on extending databases beyond thetraditionalrelationalmodel, relatively littlehasbeendonetodevelopdatabasetoolsforquerying dataorganizedin(multidimensional)arrays.Thescientic computingcommunityhasmadelittleuseofavailable databasetechnology.Instead,multidimensionalscientic dataistypicallystoredinlocallesconformingtovarious dataexchangeformatsandqueriedviaspecializedaccess librariestiedintogeneralpurposeprogramminglanguages. Toallowsuchdatatobequeriedusingknowndatabase techniques,wedesignandimplementaquerylanguagefor multidimensionalarrays.Ourmaindesigndecisionisto treatarraysasfunctionsfromindexsetstovaluesrather thanascollectiontypes.Thisleadstocleansyntaxand semanticsaswellassimplebutpowerfuloptimizationrules. Wepresentacalculusforarraysthatextendsstandard calculiforcomplexobjects.Wederiveahigher-level comprehensionstylequerylanguagebasedonthiscalculus anddescribeitsimplementation,includingadatadriverfor theNetCDFdataexchangeformat.Next,weexploresome optimizationrulesobtainedfromtheequationallawsofour corecalculus.Finally,westudytheexpressivenessofour calculusandprovethatitessentiallycorrespondstoadding rankingtoaquerylanguageforcomplexobjects. 1Introduction Dataorganizedintomultidimensionalarraysarises naturallyinavarietyofscienticdisciplines.Yetthe arraytypehasreceivedlittleattentioninmostrecent
Query Languages for Bags and Aggregate Functions
- Journal of Computer and System Sciences
, 1997
"... Theoretical foundations for querying databases based on bags are studied in this paper. We fully determine the strength of many polynomial-time bag operators relative to an ambient query language. Then we obtain BQL, a query language for bags, by picking the strongest combination of these operators. ..."
Abstract
-
Cited by 57 (32 self)
- Add to MetaCart
(Show Context)
Theoretical foundations for querying databases based on bags are studied in this paper. We fully determine the strength of many polynomial-time bag operators relative to an ambient query language. Then we obtain BQL, a query language for bags, by picking the strongest combination of these operators. The relationship between the nested relational algebra and various fragments of BQL is investigated. The precise amount of extra power that BQL possesses over the nested relational algebra is determined. It is shown that the additional expressiveness of BQL amounts to adding aggregate functions to a relational language. The expressive power of BQL and related languages is investigated in depth. We prove that these languages possess the conservative extension property. That is, the expressibility of queries in these languages is independent of the nesting height of intermediate data. Using this result, we show that recursive queries, such as transitive closure, are not definable in BQL. A ne...
Top–Down Induction of Decision Trees Classifiers–A survey
- Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
, 2005
"... Abstract—Decision trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. This pape ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Decision trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. This paper presents an updated survey of current methods for constructing decision tree classifiers in a top-down manner. The paper suggests a unified algorithmic framework for presenting these algorithms and describes the various splitting criteria and pruning methodologies. Index Terms—Classification, decision trees, pruning methods, splitting criteria. I.
On the Complexity of Nonrecursive XQuery and Functional Query Languages on Complex Values
- In Proc. PODS’05
"... This article studies the complexity of evaluating functional query languages for complex values such as monad algebra and the recursion-free fragment of XQuery. We show that monad algebra with equality restricted to atomic values is complete for the class TA[2O(n) , O(n)] of problems solvable in lin ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
(Show Context)
This article studies the complexity of evaluating functional query languages for complex values such as monad algebra and the recursion-free fragment of XQuery. We show that monad algebra with equality restricted to atomic values is complete for the class TA[2O(n) , O(n)] of problems solvable in linear exponential time with a linear number of alternations. The monotone fragment of monad algebra with atomic value equality but without negation is complete for nondeterministic exponential time. For monad algebra with deep equality, we establish TA[2O(n) , O(n)] lower and exponential-space upper bounds. We also study a fragment of XQuery, Core XQuery, that seems to incorporate all the features of a query language on complex values that are traditionally deemed essential. A close connection between monad algebra on lists and Core XQuery (with “child ” as the only axis) is exhibited, and it is shown that these languages are expressively equivalent up to representation issues. We show that Core XQuery is just as hard as monad algebra w.r.t. query and combined complexity, and that it is in TC0 if the query is assumed fixed. As Core XQuery is NEXPTIME-hard, it is commonly believed that any algorithm for evaluating Core XQuery has to require exponential amounts of working memory and doubly exponential time in the worst case. We present a property of queries – the lack of a certain form of composition – that virtually all real-world XQueries have and that allows for query evaluation in singly exponential time and polynomial space. Still, we are able to show for an important special case – Core XQuery with equality testing restricted to atomic values – that the composition-free language is just as expressive as the language with composition. Thus, under widely-held complexitytheoretic assumptions, the composition-free language is an exponentially less succinct version of the language with composition.
Some Properties of Query Languages for Bags
- IN PROCEEDINGS OF 4TH INTERNATIONAL WORKSHOP ON DATABASE PROGRAMMING LANGUAGES
, 1993
"... In this paper we study the expressive power of query languages for nested bags. We define the ambient bag language by generalizing the constructs of the relational language of Breazu-Tannen, Buneman and Wong, which is known to have precisely the power of the nested relational algebra. Relative s ..."
Abstract
-
Cited by 42 (28 self)
- Add to MetaCart
(Show Context)
In this paper we study the expressive power of query languages for nested bags. We define the ambient bag language by generalizing the constructs of the relational language of Breazu-Tannen, Buneman and Wong, which is known to have precisely the power of the nested relational algebra. Relative strength of additional polynomial constructs is studied, and the ambient language endowed with the strongest combination of those constructs is chosen as a candidate for the basic bag language, which is called BQL (Bag Query Language). We prove that achieveing the power of BQL in the relational language amounts to adding simple arithmetic to the latter. We show that BQL has shortcomings of the relational algebra: it can not express recursive queries. In particular, parity test is not definable in BQL. We consider augmenting BQL with powerbag and structural recursion to overcome this deficiency. In contrast to the relational case, where powerset and structural recursion are equivalent...
New Techniques for Studying Set Languages, Bag Languages and Aggregate Functions
, 1994
"... We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphs and prove that the properties of thes ..."
Abstract
-
Cited by 37 (22 self)
- Add to MetaCart
We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphs and prove that the properties of these graphs that can be tested in such languages are either finite or cofinite. This result settles the conjectures of Grumbach, Milo, and Paredaens that parity test, transitive closure, and balanced binary tree test are not expressible in bag languages like the PTIME fragment of BALG of Grumbach and Milo and BQL of Libkin and Wong. Moreover, it implies that many recursive queries, including simple ones like the test for a chain, cannot be expressed in a nested relational language even when aggregate functions are available. In an attempt to generalize the finite-cofiniteness result, we study the bounded degree property which says that the number of distinct in- and out-degrees in the output of...
Local Properties of Query Languages
"... predeterminedportionoftheinput.Examplesincludeallrelationalcalculusqueries. everyrelationalcalculus(rst-order)queryislocal,thegeneralresultsprovedforlocalqueriescan manyeasyinexpressibilityproofsforlocalqueries.Wethenconsideracloselyrelatedproperty, namely,theboundeddegreeproperty.Itdescribestheoutp ..."
Abstract
-
Cited by 31 (21 self)
- Add to MetaCart
(Show Context)
predeterminedportionoftheinput.Examplesincludeallrelationalcalculusqueries. everyrelationalcalculus(rst-order)queryislocal,thegeneralresultsprovedforlocalqueriescan manyeasyinexpressibilityproofsforlocalqueries.Wethenconsideracloselyrelatedproperty, namely,theboundeddegreeproperty.Itdescribestheoutputsoflocalqueriesonstructuresthat locallylook\simple."Everyquerythatislocalisshowntohavetheboundeddegreeproperty.Since Westartbyprovingageneralresultdescribingoutputsoflocalqueries.Thisresultleadsto toapplythanEhrenfeucht-Frassegames.Wealsoshowthatsomegeneralizationsofthebounded degreepropertythatwereconjecturedtohold,failforrelationalcalculus. beviewedas\o-the-shelf"strategiesforprovinginexpressibilityresults,whichareofteneasier maintenanceofviews,andshowthatSQLandrelationalcalculusareincapableofmaintainingthe gregates,whichisessentiallyplainSQL,hastheboundeddegreeproperty,thusansweringaques tionthathasbeenopenforseveralyears.Consequently,rst-orderquerieswithHartigorRescher quantiersalsohavetheboundeddegreeproperty.Finally,weapplyourresultstoincremental Wethenprovethatthelanguageobtainedfromrelationalcalculusbyaddinggroupingandag-
Sequences, Datalog and Transducers
, 1996
"... This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on a new notion called the extended active domain of a database. The extended domain contains all the sequences in the database and all their subsequences. This idea leads to a clear distinction between safe and unsafe recursion over sequences: safe recursion stays inside the extended active domain, while unsafe recursion does not. By carefully limiting the amountof unsafe recursion, the paper develops a safe and expressive subset of Sequence Datalog. As part of the development, a new type of transducer is introduced, called a generalized sequence transducer. Unsafe recursion is allowed only within these generalized transducers. Generalized transducers extend ordinary transducers by allowing them to invoke other transducers as "subroutines." Generalized transducers can be implemented in Sequence Datalog in a straightforward way. Moreover, their introduction into the language leads to simple conditions that guarantee safety and finiteness. This paper develops two such conditions. The first condition expresses exactly the class of ptime sequence functions; and the second expresses exactly the class of elementary sequence functions.