Results 1 -
5 of
5
The Complexity of Causality and Responsibility for Query Answers and non-Answers
"... An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consi ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Chockler’s recent definitions of causality and responsibility to define the causes of answers and non-answers to queries, and their degree of responsibility. Responsibility captures the notion of degree of causality and serves to rank potentially many causes by their relative contributions to the effect. Then, we study the complexity of computing causes and responsibilities for conjunctive queries. It is known that computing causes is NP-complete in general. Our first main result shows that all causes to conjunctive queries can be computed by a relational query which may involve negation. Thus, causality can be computed in PTIME, and very efficiently so. Next, we study computing responsibility. Here, we prove that the complexity depends on the conjunctive query and demonstrate a dichotomy between PTIME and NP-complete cases. For the PTIME cases, we give a non-trivial algorithm, consisting of a reduction to the max-flow computation problem. Finally, we prove that, even when it is in PTIME, responsibility is complete for LOGSPACE, implying that, unlike causality, it cannot be computed by a relational query. 1.
Explaining Missing Answers to SPJUA Queries
, 2010
"... This paper addresses the problem of explaining missing answers in queries that include selection, projection, join, union, aggregation and grouping (SPJUA). Explaining missing answers of queries is useful in various scenarios, including query understanding and debugging. We present a general framewo ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper addresses the problem of explaining missing answers in queries that include selection, projection, join, union, aggregation and grouping (SPJUA). Explaining missing answers of queries is useful in various scenarios, including query understanding and debugging. We present a general framework for the generation of these explanations based on source data. We describe the algorithms used to generate a correct, finite, and, when possible, minimal set of explanations. These algorithms are part of Artemis, a system that assists query developers in analyzing queries by, for instance, allowing them to ask why certain tuples are not in the query results. Experimental results demonstrate that Artemis generates explanations of missing tuples at a pace that allows developers to effectively use them for query analysis. 1.
Causality in Databases ∗
"... Provenance is often used to validate data, by verifying its origin and explaining its derivation. When searching for “causes ” of tuples in the query results or in general observations, the analysis of lineage becomes an essential tool for providing such justifications. However, lineage can quickly ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Provenance is often used to validate data, by verifying its origin and explaining its derivation. When searching for “causes ” of tuples in the query results or in general observations, the analysis of lineage becomes an essential tool for providing such justifications. However, lineage can quickly grow very large, limiting its immediate use for providing intuitive explanations to the user. The formal notion of causality is a more refined concept that identifies causes for observations based on user-defined criteria, and that assigns to them gradual degrees of responsibility based on their respective contributions. In this paper, we initiate a discussion on causality in databases, give some simple definitions, and motivate this formalism through a number of example applications. 1
Refining Information Extraction Rules using Data Provenance.........................................
"... ..."
Bringing Provenance to its Full Potential using Causal Reasoning
"... 2 Causality Preliminaries Provenance information is often used to explain query results and outcomes, exploit results of prior reasoning, and establish trust in data. The generality of the notion makes it applicable in a variety of domains, including data warehousing [7], curated databases [4], and ..."
Abstract
- Add to MetaCart
2 Causality Preliminaries Provenance information is often used to explain query results and outcomes, exploit results of prior reasoning, and establish trust in data. The generality of the notion makes it applicable in a variety of domains, including data warehousing [7], curated databases [4], and various scientific applications. The recent introduction of causal reasoning in a database setting exploits provenance in ways that expand its applicability to more complex problems, and establish new directions, making a step towards achieving provenance’s full potential. In this paper we explore through a variety of examples how causality improves on provenance information, discuss the challenges of building causality able systems, and propose some new directions. 1

