Results 1 - 10
of
14
Trio: a system for integrated management of data, accuracy, and lineage
- PRESENTED AT CIDR 2005
, 2005
"... Trio is a new database system that manages not only data, butalsotheaccuracy and lineage of the data. Inexact (uncertain, probabilistic, fuzzy, approximate, incomplete, and imprecise!) databases have been proposed in the past, and the lineage problem also has been studied. The goals of the Trio proj ..."
Abstract
-
Cited by 174 (11 self)
- Add to MetaCart
Trio is a new database system that manages not only data, butalsotheaccuracy and lineage of the data. Inexact (uncertain, probabilistic, fuzzy, approximate, incomplete, and imprecise!) databases have been proposed in the past, and the lineage problem also has been studied. The goals of the Trio project are to combine and distill previous work into a simple and usable model, design a query language as an understandable extension to SQL, and most importantly build a working system—a system that augments conventional data management with both accuracy and lineage as an integral part of the data. This paper provides numerous motivating applications for Trio and lays out preliminary plans for the data model, query language, and prototype system.
Partial Results for Online Query Processing
, 2002
"... Traditional query processors generate full, accurate query results, either in batch or in pipelined fashion. We argue that this strict model is too rigid for exploratory queries over diverse and distributed data sources, such as sources on the Internet. Instead, we propose a looser model of querying ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Traditional query processors generate full, accurate query results, either in batch or in pipelined fashion. We argue that this strict model is too rigid for exploratory queries over diverse and distributed data sources, such as sources on the Internet. Instead, we propose a looser model of querying in which a user submits a broad initial query outline, and the system continually generates partial result tuples that may contain values for only some of the output fields. The user can watch these partial results accumulate at the user interface, and accordingly refine the query by specifying their interest in di#erent kinds of partial results. After
Prairie: A rule specification framework for query optimizers
- In Proceedings 11th International Conference on Data Engineering
, 1995
"... From our experience, current rule-based query optimizers do not provide a very intuitive and well-defined framework to define rules and actions. To remedy this situation, we propose an extensible and structured algebraic framework called Prairie for specifying rules. Prairie facilitates rule-writing ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
From our experience, current rule-based query optimizers do not provide a very intuitive and well-defined framework to define rules and actions. To remedy this situation, we propose an extensible and structured algebraic framework called Prairie for specifying rules. Prairie facilitates rule-writing by enabling a user to write rules and actions more quickly, correctly and in an easy-to-understand and easy-to-debug manner. Query optimizers consist of three major parts: a search space, a cost model and a search strategy. The approach we take is only to develop the algebra which defines the search space and the cost model; we do not propose a search engine (i.e., search strategy) to drive the rules. We have chosen the Volcano optimizer generator as our search engine, because it is publicly available, and also because it has an efficient branch-and-bound search strategy. Using Prairie as a front-end, we translate Prairie rules to Volcano to validate our claim that Prairie makes it easier to write rules. We describe our algebra and present experimental results which show that using a high-level framework like Prairie to design large-scale optimizers does not sacrifice efficiency. 1
Approximation techniques for spatial data
, 2004
"... Spatial Database Management Systems (SDBMS), e.g., Geographical Information Systems, that manage spatial objects such as points, lines, and hyper-rectangles, often have very high query processing costs. Accurate selectivity estimation during query optimization therefore is crucially important for fi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Spatial Database Management Systems (SDBMS), e.g., Geographical Information Systems, that manage spatial objects such as points, lines, and hyper-rectangles, often have very high query processing costs. Accurate selectivity estimation during query optimization therefore is crucially important for finding good query plans, especially when spatial joins are involved. Selectivity estimation has been studied for relational database systems, but to date has only received little attention in SDBMS. In this paper, we introduce novel methods that permit high-quality selectivity estimation for spatial joins and range queries. Our techniques can be constructed in a single scan over the input, handle inserts and deletes to the database incrementally, and hence they can also be used for processing of streaming spatial data. In contrast to previous approaches, our techniques return approximate results that come with provable probabilistic quality guarantees. We present a detailed analysis and experimentally demonstrate the efficacy of the proposed techniques. 1.
Cost-Based Optimization of Decision Support Queries using Transient-Views
- ACM SIGMOD Conf
, 1998
"... Next generation decision support applications, besides being capable of processing huge amounts of data, require the ability to integrate and reason over data from multiple, heterogeneous data sources. Often, these data sources differ in a variety of aspects such as their data models, the query lang ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Next generation decision support applications, besides being capable of processing huge amounts of data, require the ability to integrate and reason over data from multiple, heterogeneous data sources. Often, these data sources differ in a variety of aspects such as their data models, the query languages they support, and their network protocols. Also, typically they are spread over a wide geographical area. The cost of processing decision support queries in such a setting is quite high. However, processing these queries often involves redundancies such as repeated access of same data source and multiple execution of similar processing sequences. Minimizing these redundancies would significantly reduce the query processing cost. In this paper, we (1) propose an architecture for processing complex decision support queries involving multiple, heterogeneous data sources
Efficient dynamic mining of constrained frequent sets
- ACM Trans. Database Syst
, 2003
"... Data mining is supposed to be an iterative and exploratory process. In this context, we are working on a project with the overall objective of developing a practical computing environment for the human-centered exploratory mining of frequent sets. One critical component of such an environment is the ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Data mining is supposed to be an iterative and exploratory process. In this context, we are working on a project with the overall objective of developing a practical computing environment for the human-centered exploratory mining of frequent sets. One critical component of such an environment is the support for the dynamic mining of constrained frequent sets of items. Constraints enable users to impose a certain focus on the mining process; dynamic means that, in the middle of the computation, users are able to (i) change (such as tighten or relax) the constraints and/or (ii) change the minimum support threshold, thus having a decisive influence on subsequent computations. In a real-life situation, the available buffer space may be limited, thus adding another complication to the problem. In this article, we develop an algorithm, called DCF, for Dynamic Constrained Frequent-set computation. This algorithm is enhanced with a few optimizations, exploiting a lightweight structure called a segment support map. It enables DCF to (i) obtain sharper bounds on the support of sets of items, and to (ii) better exploit properties of constraints. Furthermore, when handling dynamic changes to constraints, DCF relies on the concept of a delta member generating function, which generates precisely the sets of items that satisfy the new but not the old constraints. Our experimental results show the effectiveness of these enhancements.
Optimization and Evaluation of Disjunctive Queries
- IEEE Trans. on Knowledge and Data Engineering
, 2000
"... this paper, we propose a novel technique, called ###### ##########, for evaluating such disjunctive queries. The bypass processing technique is based on new selection and join operators that produce two output streams: the ####-stream with tuples satisfying the selection (join) predicate and the # ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
this paper, we propose a novel technique, called ###### ##########, for evaluating such disjunctive queries. The bypass processing technique is based on new selection and join operators that produce two output streams: the ####-stream with tuples satisfying the selection (join) predicate and the #####-stream with tuples not satisfying the corresponding predicate. Splitting the tuple streams in this way enables us to bypass costly predicates whenever the fate of the corresponding tuple (stream) can be determined without evaluating this predicate. In the paper, we show how to systematically generate bypass evaluation plans utilizing a bottom-up building block approach. We show that our evaluation technique allows to incorporate the standard SQL semantics of null values. For this, we devise two different approaches: One is based on explicitly incorporating three-valued logic into the evaluation plans; the other one relies on two-valued logic by moving all negations to atomic conditions of the selection predicate. We describe how to extend an iterator-based query engine to support bypass evaluation with little extra overhead. This query engine was used to quantitatively evaluate the bypass evaluation plans against the traditional evaluation techniques utilizing a CNF- or DNF-based query predicate
PR-Join: A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees
"... Online aggregation is a promising solution to achieving fast early responsesforinteractivead-hocqueriesthatcomputeaggregateson a large amount of data. Essential to the success of online aggregationisagoodnon-blockingjoinalgorithmthatenablesboth(i)high early result rates with statistical guarantees a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Online aggregation is a promising solution to achieving fast early responsesforinteractivead-hocqueriesthatcomputeaggregateson a large amount of data. Essential to the success of online aggregationisagoodnon-blockingjoinalgorithmthatenablesboth(i)high early result rates with statistical guarantees and (ii) fast end-to-end querytimes. Weanalyzeexistingnon-blockingjoinalgorithmsand find that they all provide sub-optimal early result rates, and those with fast end-to-end times achieve them only by further sacrificing their earlyresultrates. We propose a new non-blocking join algorithm, Partitioned expandingRippleJoin(PR-Join),whichachievesconsiderablyhigher early result rates than previous non-blocking joins, while also delivering fast end-to-end query times. PR-Join performs separate, ripple-like join operations on individual hash partitions, where the width of a ripple expands multiplicatively over time. This contrastswiththenon-partitioned,fixed-widthripplesofBlockRipple Join. Assuming, as in previous non-blocking join studies, that the inputrelations arein random order,PR-Joinensures representative early results that are amenable to statistical guarantees. We show both analytically and with real-machine experiments that PR-Join achieves over an order of magnitude higher early result rates than previous non-blocking joins. We also discuss the benefits of using aflash-basedSSDfortemporarystorage,showingthatPR-Joincan then achieve close to optimal end-to-end performance. Finally, we considerthejoiningoffinitedatastreamsthatarriveovertime,and find that PR-Join achieves similar or higher result rates than RPJ, thestate-of-the-artalgorithmspecialized forthat domain. Categories andSubjectDescriptors H.2.4[DATABASEMANAGEMENT]:Systems—Queryprocessing;
A Plan-Operator Concept for Client-Based Knowledge Processing
- Processing, Proc. 19th VLDB Conference
, 1993
"... In knowledge processing frameworks for advanced DBMS (such as OODBMS or KBMS) suitable for client-server architectures, the efficient realization of client-based, mainmemory query processing represents a promising and important step towards an effective support of application processing. In this pap ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In knowledge processing frameworks for advanced DBMS (such as OODBMS or KBMS) suitable for client-server architectures, the efficient realization of client-based, mainmemory query processing represents a promising and important step towards an effective support of application processing. In this paper we present a plan-operator concept developed along these lines as part of an algebraic queryprocessing framework for the KBMS KRISYS [24]. We motivate the main goals (support of extensibility, client-based query processing, and dynamic query optimization) as well as the resulting design objectives followed and give a detailed description of the specification and implementation of the resulting plan-operator concept. It can be characterized as highly modular and orthogonal w.r.t. the overall functionality and allows a flexible and extensive utilization of precompiled code fragments. Additionally, we sketch the advantages of the concept w.r.t. our design objectives and point out that the presented ideas are not limited to KRISYS but also apply in a more general setting.

