Results 1 - 10
of
10
Trio: a system for integrated management of data, accuracy, and lineage
- PRESENTED AT CIDR 2005
, 2005
"... Trio is a new database system that manages not only data, butalsotheaccuracy and lineage of the data. Inexact (uncertain, probabilistic, fuzzy, approximate, incomplete, and imprecise!) databases have been proposed in the past, and the lineage problem also has been studied. The goals of the Trio proj ..."
Abstract
-
Cited by 174 (11 self)
- Add to MetaCart
Trio is a new database system that manages not only data, butalsotheaccuracy and lineage of the data. Inexact (uncertain, probabilistic, fuzzy, approximate, incomplete, and imprecise!) databases have been proposed in the past, and the lineage problem also has been studied. The goals of the Trio project are to combine and distill previous work into a simple and usable model, design a query language as an understandable extension to SQL, and most importantly build a working system—a system that augments conventional data management with both accuracy and lineage as an integral part of the data. This paper provides numerous motivating applications for Trio and lays out preliminary plans for the data model, query language, and prototype system.
Management of Uncertainty in Database Systems
- In Modern Database Systems, Won
, 1995
"... Databases are models of the real world. Yet, our knowledge of the real world is often imperfect, thus challenging our ability to create databases of integrity. To uphold the integrity of a database in situations where knowledge of the real world is imperfect, one may either (1) restrict the model to ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Databases are models of the real world. Yet, our knowledge of the real world is often imperfect, thus challenging our ability to create databases of integrity. To uphold the integrity of a database in situations where knowledge of the real world is imperfect, one may either (1) restrict the model to that portion of the real world about which perfect information is available, or (2) develop formalisms that allow the representation of imperfect information. This paper surveys some of the better-known database formalisms for capturing imperfect information. Imperfections in the specification and processing of transactions also have important impact on the quality of the information delivered to users, and this survey discusses them as well.
Query Processing over Incomplete Autonomous Databases
"... Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possible answers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possible answers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possible answers with high precision, high recall, and manageable cost. 1.
Query processing over incomplete autonomous databases
- In Proc. ICDE
, 2006
"... Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such relevant uncertain answers and gauge their relevance by accessing their likelihood of being relevant answers to the query. The autonomous nature of the databases poses several challenges in realizing this idea. Such challenges include the restricted access privileges, limited query patterns and sensitivity of database and network resource consumption in the web environment. We introduce a novel query rewriting and optimization framework that tackles these challenges. Our technique involves reformulating the user query based on approximate functional dependencies (AFDs) among the database attributes. The reformulated queries are aimed at retrieving the relevant uncertain answers in addition to the certain answers. Our query processing framework QPIAD is able to gauge the relevance of such reformulated queries to manage the cost of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of AFDs) and value distributions(using Naïve Bayes Classifiers). We present empirical studies to demonstrate that our approach is effective in retrieving relevant uncertain answers with high precision, high recall and manageable cost. 1
Data Summarization in Relational Databases Through Fuzzy Dependencies
- Information Sciences
, 1999
"... In this paper we deal with the problem of data summarization through the concept of fuzzy dependency. We introduce a projection operator which leads us to partition a database into two projections with a less amount of information. Thus, we can replace the original relation by those projections. In ..."
Abstract
- Add to MetaCart
In this paper we deal with the problem of data summarization through the concept of fuzzy dependency. We introduce a projection operator which leads us to partition a database into two projections with a less amount of information. Thus, we can replace the original relation by those projections. In this process we must guarantee that we can recover the original relation, through the projections, using a special join operator. This process can be done whenever the relation satisfies a fuzzy dependency. The projection represents a set of fuzzy rules explaining such a dependency. Also, we show that this definition of dependency maintains the good properties of completeness as in the classical case. Keywords: Fuzzy Dependencies, Relational Databases, Resemblance, Fuzzy Projection, Fuzzy Join 1 Preliminaries 1.1 Introduction The treatment of non crisp information in databases has been accomplished over the last decade by several authors. The study of incomplete information has been addre...
THE ANALYTICAL MEDIATOR FOR MULTI-DIMENSIONAL DATA
"... The Analytical Mediator System (AMS) is a fully functional experimental prototype system for importing, cleansing, transforming and integrating, and analyzing heterogeneous data. The AMS provides a straightforward, SQL-like model for managing the entire data management lifecycle. Other critical feat ..."
Abstract
- Add to MetaCart
The Analytical Mediator System (AMS) is a fully functional experimental prototype system for importing, cleansing, transforming and integrating, and analyzing heterogeneous data. The AMS provides a straightforward, SQL-like model for managing the entire data management lifecycle. Other critical features of the AMS include: its facility for constructing a domain model suitable for representing the combined semantics of an integrated data set, its easy-to-use facility for reducing the dimensionality of a large data set, its ability to interface with standard data mining tools, its simple GUI, and the easy-to-understand reports it generates.
Noname manuscript No. (will be inserted by the editor) Supporting Ranking Queries on Uncertain and Incomplete Data
"... Abstract Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes introduces new problems that are fundamentally different from conventional ..."
Abstract
- Add to MetaCart
Abstract Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes introduces new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records’ scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders under two widely-adopted distance metrics. In addition, we design sampling techniques based on Markov chains to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques under various configurations. 1
August 2006QUERY PROCESSING OVER INCOMPLETE AUTONOMOUS WEB DATABASES by
, 2006
"... Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical ..."
Abstract
- Add to MetaCart
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to the user query. Ideally, the mediator is expected to retrieve such relevant uncertain answers and gauge their relevance by accessing their likelihood of being relevant answers to the query. The autonomous nature of the databases poses several challenges in realizing this idea. Such challenges include restricted access privileges, limited query patterns and cost sensitivity of database and network resource consumption in web environment. This thesis presents QPIAD – a framework for query processing over incomplete autonomous databases. QPIAD is able to retrieve relevant uncertain answers with high precision, high recall and manageable cost. Data integration over multiple autonomous data sources is an important task performed by a mediator. This thesis describes query rewriting techniques to perform data integration over multiple incomplete autonomous data sources on the web. Results of experimental evaluation on real-life databases demonstrate that our system retrieve relevant
REGULAR PAPER Supporting ranking queries on uncertain and incomplete data
"... Abstract Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes introduces new problems that are fundamentally different from conventional ..."
Abstract
- Add to MetaCart
Abstract Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes introduces new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records ’ scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders under two widely adopted distance metrics. In addition, we design sampling techniques based on Markov chains to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques under various configurations.

