Results 1 -
5 of
5
The Complexity of Causality and Responsibility for Query Answers and non-Answers
"... An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consi ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Chockler’s recent definitions of causality and responsibility to define the causes of answers and non-answers to queries, and their degree of responsibility. Responsibility captures the notion of degree of causality and serves to rank potentially many causes by their relative contributions to the effect. Then, we study the complexity of computing causes and responsibilities for conjunctive queries. It is known that computing causes is NP-complete in general. Our first main result shows that all causes to conjunctive queries can be computed by a relational query which may involve negation. Thus, causality can be computed in PTIME, and very efficiently so. Next, we study computing responsibility. Here, we prove that the complexity depends on the conjunctive query and demonstrate a dichotomy between PTIME and NP-complete cases. For the PTIME cases, we give a non-trivial algorithm, consisting of a reduction to the max-flow computation problem. Finally, we prove that, even when it is in PTIME, responsibility is complete for LOGSPACE, implying that, unlike causality, it cannot be computed by a relational query. 1.
Why so? or why no? functional causality for explaining query answers
- CoRR
, 2009
"... Abstract. In this paper, we propose causality as a unified framework to explain query answers and non-answers, thus generalizing and extending several previously proposed definitions of provenance and missing query result explanations. Starting from the established definition of actual causes by Hal ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. In this paper, we propose causality as a unified framework to explain query answers and non-answers, thus generalizing and extending several previously proposed definitions of provenance and missing query result explanations. Starting from the established definition of actual causes by Halpern and Pearl [12], we propose functional causes as a refined definition of causality with several desirable properties. These properties allow us to apply our notion of causality in a database context and apply it uniformly to define the causes of query results and their individual contributions in several ways: (i) we can model both provenance as well as non-answers, (ii) we can define explanations as either data in the input relations or relational operations in a query plan, and (iii) we can give graded degrees of responsibility to individual causes, thus allowing us to rank causes. In particular, our approach allows us to explain contributions to relational aggregate functions and to rank causes according to their respective responsibilities, aiding users in identifying errors in uncertain or untrusted data. Throughout the paper, we illustrate the applicability of our framework with several examples. This is the first work that treats “positive ” and “negative ” provenance under the same framework, and establishes the theoretical foundations of causality theory in a database context. 1
TAP: Time-aware provenance for distributed systems
- In Proc. USENIX Workshop on the Theory and Practice of Provenance (TaPP
, 2011
"... In this paper, we explore the use of provenance for analyzing execution dynamics in distributed systems. We argue that provenance could have significant practical benefits for system administrators, e.g., for reasoning about changes in a system’s state, diagnosing protocol misconfigurations, detecti ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper, we explore the use of provenance for analyzing execution dynamics in distributed systems. We argue that provenance could have significant practical benefits for system administrators, e.g., for reasoning about changes in a system’s state, diagnosing protocol misconfigurations, detecting intrusions, and pinpointing performance bottlenecks. However, to realize this vision, we must revisit several aspects of provenance management. As a first step, we present time-aware provenance (TAP), an enhanced provenance model that explicitly represents time, distributed state, and state changes. We outline our research agenda towards developing novel query processing, languages, and optimization techniques that can be used to efficiently and securely query time-aware provenance, even in the presence of transient state or untrusted nodes. 1
Contexts and Data Quality Assessment
"... The quality of data is context dependent. Starting from this intuition and experience, we propose and develop a conceptual framework that captures in formal terms the notion of context-dependent data quality. We start by proposing a generic and abstract notion of context, and also of its uses, in ge ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The quality of data is context dependent. Starting from this intuition and experience, we propose and develop a conceptual framework that captures in formal terms the notion of context-dependent data quality. We start by proposing a generic and abstract notion of context, and also of its uses, in general and in data management in particular. On this basis, we investigate data quality assessment and quality query answering as context-dependent activities. A context for the assessment of a database D at hand is modeled as an external database schema, with possibly materialized or virtual data, and connections to external data sources. The database D is put in context via mappings to the contextual schema, which produces a collection C of alternative clean versions of D. The quality of D is measured in terms of its distance to C. TheclassC is also used to define and do quality query answering. The proposed model allows for natural extensions, like the use of data quality predicates, the optimization of the access by the context to external data sources, and also the representation of contexts by means of more expressive ontologies.
Bringing Provenance to its Full Potential using Causal Reasoning
"... 2 Causality Preliminaries Provenance information is often used to explain query results and outcomes, exploit results of prior reasoning, and establish trust in data. The generality of the notion makes it applicable in a variety of domains, including data warehousing [7], curated databases [4], and ..."
Abstract
- Add to MetaCart
2 Causality Preliminaries Provenance information is often used to explain query results and outcomes, exploit results of prior reasoning, and establish trust in data. The generality of the notion makes it applicable in a variety of domains, including data warehousing [7], curated databases [4], and various scientific applications. The recent introduction of causal reasoning in a database setting exploits provenance in ways that expand its applicability to more complex problems, and establish new directions, making a step towards achieving provenance’s full potential. In this paper we explore through a variety of examples how causality improves on provenance information, discuss the challenges of building causality able systems, and propose some new directions. 1

