## Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints (2005)

### Cached

### Download Links

- [homepages.inf.ed.ac.uk]
- [www.inf.udec.cl]
- [www.scs.carleton.ca]
- [people.scs.carleton.ca:8008]
- [www.inf.udec.cl]
- [people.scs.carleton.ca]
- [homepages.inf.ed.ac.uk]
- [arxiv.org]
- [www.inf.unibz.it]
- [www.inf.unibz.it]
- [www.scs.carleton.ca]
- [www.inf.unibz.it]
- [people.scs.carleton.ca]
- [people.scs.carleton.ca:8008]
- [www.inf.unibz.it]
- [www.scs.carleton.ca]
- [people.scs.carleton.ca]
- [www.scs.carleton.ca]
- [people.scs.carleton.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | In International Workshop on Database Programming Languages |

Citations: | 31 - 13 self |

### BibTeX

@INPROCEEDINGS{Bertossi05complexityand,

author = {Leopoldo Bertossi and Loreto Bravo and Enrico Franconi and Andrei Lopatenko},

title = {Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints},

booktitle = {In International Workshop on Database Programming Languages},

year = {2005},

pages = {262--278},

publisher = {Springer, LNCS}

}

### OpenURL

### Abstract

Abstract. Consistent query answering is the problem of computing the answers from a database that are consistent with respect to certain integrity constraints that the database as a whole may fail to satisfy. Those answers are characterized as those that are invariant under minimal forms of restoring the consistency of the database. In this context, we study the problem of repairing databases by fixing integer numerical values at the attribute level with respect to denial and aggregate constraints. We introduce a quantitative definition of database fix, and investigate the complexity of several problems such as DFP, i.e. the existence of fixes within a given distance from the original instance, and CQA, i.e. deciding consistency of answers to aggregate conjunctive queries under different semantics. We provide sharp complexity bounds, identify relevant tractable cases; and introduce approximation algorithms for some of those that are intractable. More specifically, we obtain results like undecidability of existence of fixes for aggregate constraints; MAXSNPhardness of DFP, but a good approximation algorithm for a relevant special case; and intractability but good approximation for CQA for aggregate queries for one database atom denials (plus built-ins). 1

### Citations

11403 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...on a reduction of our problem to satisfying a subsystem with maximum weight of a system of weighted algebraic equations over the Galois field with two elements GF [2] (a generalization of problems in =-=[20,33]-=-). For the latter problem, a polynomial time approximation similar to the one for MAXSAT can be given [33]. The long proof of this theorem is given in Appendix A in [6]. 7 Extensions 7.1 Dependencies ... |

2418 | Computational complexity
- Papadimitriou
- 1994
(Show Context)
Citation Context ...e stated, refer to data complexity [1], i.e. to the size of the database that here includes a binary representation for numbers. For complexity theoretic definitions and classical results we refer to =-=[29]-=-. Moving to the case of real numbers would certainly bring new issues that would require different approaches. They are left for ongoing and future research. Actually, it would be natural to investiga... |

1597 |
Foundations of Databases
- Abiteboul, Hull, et al.
- 1995
(Show Context)
Citation Context ... for a relevant subclass of denials, we provide a polynomial time approximation within a constant factor. All the algorithmic and complexity results, unless otherwise stated, refer to data complexity =-=[1]-=-, i.e. to the size of the database that here includes a binary representation for numbers. For complexity theoretic definitions and classical results we refer to [29]. Moving to the case of real numbe... |

945 |
Approximation Algorithms
- Vazirani
- 2001
(Show Context)
Citation Context ...on a reduction of our problem to satisfying a subsystem with maximum weight of a system of weighted algebraic equations over the Galois field with two elements GF [2] (a generalization of problems in =-=[20,33]-=-). For the latter problem, a polynomial time approximation similar to the one for MAXSAT can be given [33]. The long proof of this theorem is given in Appendix A in [6]. 7 Extensions 7.1 Dependencies ... |

598 | Approximation Algorithms for NP-hard problems - Hochbaum - 1995 |

575 | Optimization, approximation, and complexity classes - Papadimitriou, Yannakakis - 1991 |

390 |
On the hardness of approximating minimization problems
- Lund, Yannakakis
- 1994
(Show Context)
Citation Context ... a fixed sets IC of local denials, we can solve the instances of DROP(IC) by transforming them into instances of a Minimum Weighted Set Cover Optimization Problem (MWSCP). This problem is MAXSNP-hard =-=[28,29]-=-, and its general approximation algorithms approximate within a logarithmic factor [28,14]. By concentrating on local denials, we will be able to generate versions 7 We recall that an attribute is ass... |

389 |
Some simplified NP-complete graph problems
- Garey, Johnson, et al.
- 1976
(Show Context)
Citation Context ...y bounded and polynomial-time verifiable certificate, as established in Theorem 1. Now we consider hardness. (a) For sum: By reduction from the NP-hard problem Independent Set for Cubic Planar Graphs =-=[21]-=-, where the vertices of the graph have all degree 3. Given an undirected graph G = (V, E) of degree 3, and a lower bound k for the size for a maximum independent set, we create a predicate Ver(V, C1, ... |

303 | Consistent Query Answers in Inconsistent Databases
- Arenas, Bertossi, et al.
- 1999
(Show Context)
Citation Context ...From the logical point of view, consistently answering a query on an inconsistent database amounts to evaluating the truth of a formula against a particular class of first-order relational structures =-=[2]-=-. This process is quite different from usual truth or query evaluation on a single structure, namely the relational database at hand. In our case, the class under consideration is formed by alternativ... |

190 | The complexity of optimization problems - Krentel - 1988 |

155 |
A greedy heuristic for the set covering problem
- Chvátal
- 1979
(Show Context)
Citation Context ...them into instances of a Minimum Weighted Set Cover Optimization Problem (MWSCP). This problem is MAXSNP-hard [28,29], and its general approximation algorithms approximate within a logarithmic factor =-=[28,14]-=-. By concentrating on local denials, we will be able to generate versions 7 We recall that an attribute is associated to a unique database predicate and only one of its arguments. 23of the MWSCP that... |

126 | On the Decidability and Complexity of Query Answering over Inconsistent and Incomplete Databases
- Cali, Lembo, et al.
- 2003
(Show Context)
Citation Context ...answering has been carried out appealing to a tuple oriented repair semantics, i.e. minimal repairs are obtained through tuple insertions or deletions. Under the set-theoretic, tuple-based semantics, =-=[12,11,19]-=- present results on complexity of CQA for conjunctive queries, functional dependencies and foreign key constraints. A majority semantics was studied in [26] for database merging. The range semantics f... |

106 | Minimal-Change Integrity Maintenance Using Tuple Deletions
- Chomicki, Marcinkowski
- 2005
(Show Context)
Citation Context ...or ic1, and {t1} and {t2} are both violation sets for ic2. In consequence, I(D, IC) = {({t1, t4}, ic1), ({t1, t5}, ic1), ({t1}, ic2), ({t2}, ic2)}. ✷ Notice that the conflict hypergraph introduced in =-=[12]-=- for studying classic CQA wrt denial constraints has as vertices the database tuples in D; and as hyperedges, the violation sets for elements ic of IC. In our case, each hyperedge is labelled with its... |

92 |
A Systematic Approach to Automatic Edit and Imputation
- Fellegi, Holt
- 1976
(Show Context)
Citation Context ...ring is not addressed. 44There is interesting work in the area of statistical data editing [32]. Similar to integrity constraints, edits are used to express conditions that a data set should satisfy =-=[15]-=-. Edits can be expressed as linear inequalities. There are several alternative ways of modifying the data so that edits are satisfied [15,10,8,9]. Those methods are tailored to finding a single “repai... |

79 | Merging databases under constraints
- Lin, Mendelzon
- 1998
(Show Context)
Citation Context ...set-theoretic, tuple-based semantics, [12,11,19] present results on complexity of CQA for conjunctive queries, functional dependencies and foreign key constraints. A majority semantics was studied in =-=[26]-=- for database merging. The range semantics for CQA of aggregate queries was introduced and investigated in [3]. In that paper, the NP-completeness of CQA for atomic aggregate queries, tuple-based and ... |

70 | A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification
- Bohannon, Flaster, et al.
- 2005
(Show Context)
Citation Context ...s. In [27], optimizations, the implementation, and experiments of/with the approximation algorithm for DROP are presented. A repair semantic based on changes of attribute values is also considered in =-=[7]-=-. The ICs considered are functional and inclusion dependencies. Database tuples have numerical weights that may reflect provenance in data integration. In consequence, a repair has a weight that refle... |

65 | First-Order Query Rewriting for Inconsistent Databases - Fuxman, Miller - 2005 |

64 |
Consistent Query Answering: Five Easy Pieces
- Chomicki
- 2007
(Show Context)
Citation Context ...gation. Database repairs have been studied in the context of consistent query answering (CQA), i.e. the process of obtaining the answers to a query that are consistent wrt a given set of ICs [2] (cf. =-=[4,5,13]-=- for surveys). An answer to a query is consistent if it can be obtained as a standard answer to the query from every possible repair. In most of the research on CQA, a repair is a new instance that sa... |

60 |
Consistent Query Answering in Databases
- Bertossi
- 2006
(Show Context)
Citation Context ...gation. Database repairs have been studied in the context of consistent query answering (CQA), i.e. the process of obtaining the answers to a query that are consistent wrt a given set of ICs [2] (cf. =-=[4,5,13]-=- for surveys). An answer to a query is consistent if it can be obtained as a standard answer to the query from every possible repair. In most of the research on CQA, a repair is a new instance that sa... |

58 | Greedy is good: Approximating independent sets in sparse and bounded-degree graphs
- Halldorsson, Radhakrishnan
- 1997
(Show Context)
Citation Context ... general Independent Set Problem has bad approximation properties [16, Chapter 10]. The Bounded Degree Independent Set has efficient approximations within a constant factor that depends on the degree =-=[15]-=-. Theorem 9 For any set of 1ADs and conjunctive query with sum over a nonnegative attribute, there is a polynomial time approximation algorithm with a constant factor for CQA under min-max range seman... |

50 | Census data repair: a challenging application of disjunctive logic programming
- Franconi, Palma, et al.
- 2001
(Show Context)
Citation Context ...if the age of a mother is less than those of her offsprings. These restrictions can be expressed with denial integrity constraints, which prevent attributes from taking certain combinations of values =-=[18]-=-. Other restrictions may be expressed with aggregation ICs. For example, the maximum concentration of certain toxin in a sample may not exceed a known threshold; or the number of married men and marri... |

43 |
Condensed Representation of Database Repairs for Consistent Query Answering
- Wijsen
- 2003
(Show Context)
Citation Context ...e attributes is not brought into the model, and the distance just counts the number of changes, no matter how big or small they are. Update-based repairs for restoring consistency are also studied in =-=[34]-=-, where changing values in attributes in a tuple is made a primitive repair action. Semantic and computational problems around CQA are analyzed from this perspective. However, peculiarities of changin... |

29 | Scalar aggregation in inconsistent databases
- Arenas, Bertossi, et al.
(Show Context)
Citation Context ... in this proof is acyclic and belongs to the class CTree. ✷ For queries Q returning numerical values, which is common in our framework, it is natural to use the range semantics for CQA, introduced in =-=[3]-=- for scalar aggregate queries and functional dependencies under classical repairs. Under this semantics, a consistent answer is the pair consisting of the min-max and max-min answers, i.e. the supremu... |

16 |
Consistent Query Answers on Numerical Databases under Aggregate Constraints
- Flesca, Furfaro, et al.
(Show Context)
Citation Context ...se changes in attribute values are basic repair actions. However, the peculiarities of numerical values and quantitative distances between databases are not investigated. Recent research presented in =-=[17]-=- investigates the complexity of repair checking and CQA wrt aggregation constraints. In this case, the constraints impose linear restrictions on summarizations. The repair semantics is based on change... |

10 |
Efficient approximation algorithms for repairing inconsistent databases
- Lopatenko, Bravo
(Show Context)
Citation Context ...ting the weights using the L1 instead of the L2 distance. Thus, the general complexity and approximability results still hold. For example, the optimization and implementation of the approximation in =-=[27]-=- of the algorithm for the DROP problem presented here uses the L1 distance, without any essential changes wrt the treatment based on the L2 distance. The edit distance (ED) between two strings is the ... |

7 |
Statistical Commission and Economic Commission for Europe (UN/ECE),“Terminology on Statistical Metadata
- Nations
- 2000
(Show Context)
Citation Context ...es that had not been addressed before in the context of consistent query answering. These problems are particularly relevant in census like applications, where the problem of statistical data editing =-=[9,32]-=- is a common and difficult task. Also our concentration on aggregate queries is particularly relevant for this kind of statistical applications. In this paper we have just started to investigate some ... |

6 | Error Correction for Massive Data Sets
- Bruni
- 2005
(Show Context)
Citation Context ...are used to express conditions that a data set should satisfy [15]. Edits can be expressed as linear inequalities. There are several alternative ways of modifying the data so that edits are satisfied =-=[15,10,8,9]-=-. Those methods are tailored to finding a single “repair”. Consistent query answering has not been considered in that area, and, to the best of our knowledge, the complexity of the problem has not bee... |

6 | Fixing Numerical Attributes Under Integrity Constraints. CoRR paper cs.DB/0503032, arXiv.org e-Print archive - Bertossi, ˙Bravo, et al. |

5 |
DART: a data acquisition and repairing tool
- Fazzinga, Flesca, et al.
- 2006
(Show Context)
Citation Context ...nces does not consider the numerical values, but the set of changes wrt cardinality or set inclusion. Queries are atomic, without aggregation. Computational mechanisms are not considered. However, in =-=[16]-=- the authors present a system that uses linear programming techniques for computing a repair wrt aggregation constraints. The repair minimizes the number of changes of attribute values. In [27], optim... |

5 |
Making More Out of an Inconsistent Database
- Wijsen
(Show Context)
Citation Context ...blems around CQA are analyzed from this perspective. However, peculiarities of changing numerical attributes are not considered, and more importantly, the distance between databases instances used in =-=[34,35]-=- is based on set-theoretic homomorphisms, but is not quantitative, as in this paper. We provide semantic foundations for repairs that are based on changes on numerical attributes in the presence of ke... |

1 |
Data Editing and Logic
- Boskovitz, Goré, et al.
- 2005
(Show Context)
Citation Context ...es that had not been addressed before in the context of consistent query answering. These problems are particularly relevant in census like applications, where the problem of statistical data editing =-=[9,32]-=- is a common and difficult task. Also our concentration on aggregate queries is particularly relevant for this kind of statistical applications. In this paper we have just started to investigate some ... |

1 |
Data Editing and Logic. Proc. Work Session on Statistical Data Editing, United Nations Statistical Commission and Economic Commission for Europe, Conference of European Statisticians
- Boskovitz, Goré, et al.
- 2005
(Show Context)
Citation Context ...es that had not been addressed before in the context of consistent query answering. These problems are particularly relevant in census like applications, where the problem of statistical data editing =-=[6,24]-=- is a common and difficult task. Also our concentration on aggregate queries is particularly relevant for this kind of statistical applications. In this paper we have just started to investigate some ... |