Results 1  10
of
11
Approximation in Databases
 In PPCP'93, First International Workshop on Principles and Practice of Constraint Programming
, 1995
"... One source of partial information in databases is the need to combine information from several databases. Even if each database is complete for some "world", the combined databases will not be, and answers to queries against such combined databases can only be approximated. In this paper w ..."
Abstract

Cited by 136 (14 self)
 Add to MetaCart
(Show Context)
One source of partial information in databases is the need to combine information from several databases. Even if each database is complete for some "world", the combined databases will not be, and answers to queries against such combined databases can only be approximated. In this paper we describe various situations in which a precise answer cannot be obtained for a query asked against multiple databases. Based on an analysis of these situations, we propose a classification of constructs that can be used to model approximations. A major goal is to obtain universality properties for these models of approximations. Universality properties suggest syntax for languages with approximations based on the operations which are naturally associated with them. We prove universality properties for most of the approximation constructs. Then we use them to design languages built around datatypes given by the approximation constructs. A straightforward approach results in langauges that have a numb...
Offering a PrecisionPerformance Tradeoff for Aggregation Queries over Replicated Data
, 2000
"... Strict consistency of replicated data is infeasible or not required by many distributed applications, so current systems often permit stale replication,inwhich cached copies of data values are allowed to become out of date. Queries over cached data return an answer quickly, but the stale answer ..."
Abstract

Cited by 104 (8 self)
 Add to MetaCart
(Show Context)
Strict consistency of replicated data is infeasible or not required by many distributed applications, so current systems often permit stale replication,inwhich cached copies of data values are allowed to become out of date. Queries over cached data return an answer quickly, but the stale answer may be unboundedly imprecise. Alternatively, queries over remote master data return a precise answer, but with potentially poor performance. To bridge the gap between these two extremes, we propose a new class of replication systems called TRAPP (Tradeoff in Replication Precision and Performance). TRAPP systems give each user finegrained control over the tradeoff between precision and performance: Caches store ranges that are guaranteed to bound the current data values, instead of storing stale exact values. Users supply a quantitative precision constraint along with each query. To answer a query, TRAPP systems automatically select a combination of locally cached bounds and exact master data stored remotely to deliver a bounded answer consisting of a range that is no wider than the specified precision constraint, that is guaranteed to contain the precise answer, and that is computed as quickly as possible. This paper defines the architecture of TRAPP replication systems and covers some mechanics of caching data ranges. It then focuses on queries with aggregation, presenting optimization algorithms for answering queries with precision constraints, and reporting on performance experiments that demonstrate the finegrained control of the precisionperformance tradeoff offered by TRAPP systems.
Containment of aggregate queries
 In Proc. of ICDT
, 2003
"... Abstract. The problem of deciding containment of aggregate queries is investigated. Containment is reduced to equivalence for queries with expandable aggregation functions. Many common aggregation functions, such as max, cntd (count distinct), count, sum, avg, median and stdev (standard deviation) a ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
Abstract. The problem of deciding containment of aggregate queries is investigated. Containment is reduced to equivalence for queries with expandable aggregation functions. Many common aggregation functions, such as max, cntd (count distinct), count, sum, avg, median and stdev (standard deviation) are shown to be expandable. It is shown that even in the presence of integrity constraints, containment can be reduced to equivalence. For conjunctive count and sumqueries, simpler characterizations for containment are given, that do not require checking equivalence. These results are built upon in order to solve the problem of finding maximallycontained sets of rewritings for conjunctive countqueries. 1
Linear approximation of planar spatial databases using transitiveclosure logic
, 2000
"... We consider spatial databases in the plane that can be defined by polynomial constraint formulas. Motivated by applications in geographic information systems, we investigate linear approximations of spatial databases and study in which language they can be expressed effectively. Specifically, we sho ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
We consider spatial databases in the plane that can be defined by polynomial constraint formulas. Motivated by applications in geographic information systems, we investigate linear approximations of spatial databases and study in which language they can be expressed effectively. Specifically, we show that they cannot be expressed in the standard firstorder query language for polynomial constraint databases but that an extension of this firstorder language with transitive closure suffices to express the approximation query in an effective manner. Furthermore, we introduce an extension of transitiveclosure logic and show that this logic is complete for the computable queries on linear spatial databases. This result together with our first result implies that this extension of transitiveclosure logic can express all computable topological queries on arbitrary spatial databases in the plane.
Aggregate Operators in Constraint Query Languages
 J. Comput. System Sci
, 2002
"... We investigate the problem of how to extend constraint query languages with aggregate operators. We deal with standard relational aggregation, and also with aggregates specific to spatial data, such as volume. We study several approaches, including the addition of a new class of approximate aggregat ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
We investigate the problem of how to extend constraint query languages with aggregate operators. We deal with standard relational aggregation, and also with aggregates specific to spatial data, such as volume. We study several approaches, including the addition of a new class of approximate aggregate operators which allow an error tolerance in the computation. We show how techniques of [23, 25] based on VCdimension can be used to give languages with approximation operators, but also show that these languages have a number of shortcomings. We then give a set of results showing that it is impossible to get constraintbased languages that admit de nable aggregation operators, both for exact operators and for approximate ones. These results are quite robust, in that they show that closure under aggregation is problematic even when the class of functions permitted in constraints is expanded. This motivates a different approach to the aggregation problem. We introduce a language FO + Poly+Sum, which permits standard discrete aggregation operators to be applied to the outputs of rangerestricted constraint queries. We show that this language has a number of attractive closure and expressivity properties, and that it can compute volumes of linearconstraint databases.
Variable Independence in Constraint Databases
 IEEE Transactions on Knowledge and Data Engineering
, 1998
"... In this paper we study constraint databases with variable independence conditions (vics). Such databases occur naturally in the context of temporal and spatiotemporal database applications. Using computational geometry techniques, we show that variable independence is decidable for linear constra ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In this paper we study constraint databases with variable independence conditions (vics). Such databases occur naturally in the context of temporal and spatiotemporal database applications. Using computational geometry techniques, we show that variable independence is decidable for linear constraint databases. We also present a set of rules for inferring vics in relational algebra expressions. Using vics we dene a subset of relational algebra that is closed under restricted aggregation. Keywords: Constraint databases, aggregation, closure, integrity constraints. 1
Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness
 In Proc. Int. Conf. on Very Large Data Bases (VLDB
, 2003
"... We introduce a new parameter, the clusteredness of data, and show how it can be used for estimating the output cardinality of a partial preaggregation operator. ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We introduce a new parameter, the clusteredness of data, and show how it can be used for estimating the output cardinality of a partial preaggregation operator.
Equivalence, containment and rewriting of aggregate queries
, 2004
"... The primary goal of this thesis is to lay the theoretical foundations for a formal study of aggregate query optimization. This requires gaining a coherent understanding of equivalences and containments between aggregate queries of varied forms. A secondary goal of this thesis is to solve the view us ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The primary goal of this thesis is to lay the theoretical foundations for a formal study of aggregate query optimization. This requires gaining a coherent understanding of equivalences and containments between aggregate queries of varied forms. A secondary goal of this thesis is to solve the view usability problem for varied types of aggregate queries. The view usability problem is that of determining whether it is possible to find a reformulation of a query that uses other queries, instead of base relations. This is of importance in query optimization, since it enables reuse of previously computed results. It is also a necessity when integrating information from heterogeneous sources, when each source is described as a view over a global schema. In order to achieve the primary goal, the general problem of determining equivalences among aggregate queries is considered. Our aggregate queries may contain disjunctions, comparisons and negated subgoals. First, the bounded equivalence problem, i.e., the problem of determining equivalence of queries with respect to databases
Uniform Generation in Spatial Constraint Databases and Applications (Extended Abstract)
, 2000
"... We study the efficient approximation of queries in linear constraint databases using sampling techniques. We define the notion of an almost uniform generator for a generalized relation and extend the classical generator of Dyer, Frieze and Kannan for convex sets to the union and the projection of re ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study the efficient approximation of queries in linear constraint databases using sampling techniques. We define the notion of an almost uniform generator for a generalized relation and extend the classical generator of Dyer, Frieze and Kannan for convex sets to the union and the projection of relations. For the intersection and the difference, we give sufficient conditions for the existence of such generators. We show how such generators give relative estimations of the volume and approximations of generalized relations as the composition of convex hulls obtained from the samples.
An Efficient Transformation for Klee's Measure Problem in the Streaming Model
"... Abstract Given a stream of rectangles over a discrete space, we consider the problem of computing the total number of distinct points covered by the rectangles. This can be seen as the discrete version of the twodimensional Klee's measure problem for streaming inputs. We provide an ( , δ)app ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Given a stream of rectangles over a discrete space, we consider the problem of computing the total number of distinct points covered by the rectangles. This can be seen as the discrete version of the twodimensional Klee's measure problem for streaming inputs. We provide an ( , δ)approximation for fat rectangles. For the case of arbitrary rectangles, we provide an O( √ log U )approximation, where U is the total number of discrete points in the twodimensional space. The time to process each rectangle, the total required space, and the time to answer a query for the total area are polylogarithmic in U . Our approximations are based on an efficient transformation technique which projects rectangle areas to onedimensional ranges, and then uses a streaming algorithm for the Klee's measure problem in the onedimensional space. The projection is deterministic and to our knowledge it is the first approach of this kind which provides efficiency and accuracy tradeoffs in the streaming model.