Results 1  10
of
63
Computing Iceberg Concept Lattices with TITANIC
, 2002
"... We introduce the notion of iceberg concept lattices... ..."
Abstract

Cited by 83 (13 self)
 Add to MetaCart
We introduce the notion of iceberg concept lattices...
On Generating NearOptimal Tableaux for Conditional Functional Dependencies
, 2008
"... Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs. This paper is the first to formally characterize a “good ” pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NPcomplete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an “ondemand ” algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.
Closed Set Based Discovery of Small Covers for Association Rules
 PROC. 15EMES JOURNEES BASES DE DONNEES AVANCEES, BDA
, 1999
"... In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since reallife databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extracti ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since reallife databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers, or bases, for exact and approximate rules. Once frequent closed itemsets which constitute a generating set for both frequent itemsets and association rules have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on reallife databases show that these algorithms are efficient and valuable in practice.
Discovering Data Quality Rules
, 2008
"... Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often arises when domain constraints and business rules, meant to preserve data consistency and accuracy, are enforced incompletel ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often arises when domain constraints and business rules, meant to preserve data consistency and accuracy, are enforced incompletely or not at all in application code. In this work, we propose a new datadriven tool that can be used within an organization’s data quality management process to suggest possible rules, and to identify conformant and nonconformant records. Data quality rules are known to be contextual, so we focus on the discovery of contextdependent rules. Specifically, we search for conditional functional dependencies (CFDs), that is, functional dependencies that hold only over a portion of the data. The output of our tool is a set of functional dependencies together with the context in which they hold (for example, a rule that states for CS graduate courses, the course number and term functionally determines the room and instructor). Since the input to our tool will likely be a dirty database, we also search for CFDs that almost hold. We return these rules together with the nonconformant records (as these are potentially dirty records). We present effective algorithms for discovering CFDs and dirty values in a data instance. Our discovery algorithm searches for minimal CFDs among the data values and prunes redundant candidates. No universal objective measures of data quality or data quality rules are known. Hence, to avoid returning an unnecessarily large number of CFDs and only those that are most interesting, we evaluate a set of interest metrics and present comparative results using real datasets. We also present an experimental study showing the scalability of our techniques.
Translating between Horn Representations and their Characteristic Models
 JOURNAL OF AI RESEARCH
, 1995
"... Characteristic models are an alternative, model based, representation for Horn expressions. It has been shown that these two representations are incomparable and each has its advantages over the other. It is therefore natural to ask what is the cost of translating, back and forth, between these r ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
Characteristic models are an alternative, model based, representation for Horn expressions. It has been shown that these two representations are incomparable and each has its advantages over the other. It is therefore natural to ask what is the cost of translating, back and forth, between these representations. Interestingly, the same translation questions arise in database theory, where it has applications to the design of relational databases. We study the complexity of these problems and prove some positive and negative results. Our main result is that the two translation problems are equivalent under polynomial reductions, and that they are equivalent to the corresponding decision problem. Namely, translating is equivalent to deciding whether a given set a models is the set of characteristic models for a given Horn expression. We also relate these problems to translating between the CNF and DNF representations of monotone functions, a well known problem for which no pol...
FastFDs: A HeuristicDriven, DepthFirst Algorithm for Mining Functional Dependencies from Relation Instances
 In To appear in Lecture Notes in Computer Science (Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
, 2001
"... Discovering functional dependencies (FDs) from an existing relation instance is an important technique in data mining and database design. To date, even the most efficient solutions are exponential in the number of attributes of the relation (n), even when the size of the output is not exponenti ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Discovering functional dependencies (FDs) from an existing relation instance is an important technique in data mining and database design. To date, even the most efficient solutions are exponential in the number of attributes of the relation (n), even when the size of the output is not exponential in n. Lopes et al. developed an algorithm, DepMiner, that works well for large n on randomlygenerated integervalued relation instances [LPL 00a]. DepMiner first reduces the FD discovery problem to that of finding minimal covers for hypergraphs, then employs a levelwise search strategy to determine these minimal covers. Our algorithm, FastFDs, instead employs a depthfirst, heuristic driven search strategy for generating minimal covers of hypergraphs. This type of search is commonly used to solve search problems in Artificial Intelligence (AI) [RN 95]. Our experimental results indicate that the levelwise strategy that is the hallmark of many successful data mining algorithms is in fact significantly surpassed by the depthfirst, heuristic driven strategy FastFDs employs, due to the inherent space efficiency of the search. Furthermore, we revisit the comparison between DepMiner and Tane, including FastFDs. We report several tests on distinct benchmark relation instances, comparing the DepMiner and FastFDs hypergraph approaches to Tane's partitioning approach for mining FDs from a relation instance. At the end of the paper (appendix A) we provide experimental data comparing FastFDs with a third algorithm, fdep [FS 99].
Efficient ReadRestricted Monotone CNF/DNF Dualization by Learning with Membership Queries
, 1998
"... We consider exact learning monotone CNF formulas in which each variable appears at most some constant k times ("readk" monotone CNF). Let ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We consider exact learning monotone CNF formulas in which each variable appears at most some constant k times ("readk" monotone CNF). Let
Reasoning with Examples: Propositional Formulae and Database Dependencies
"... For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under certain conditions examplebased deduction can be used to obtain a correct and complete inference pr ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
For humans, looking at how concrete examples behave is an intuitive way of deriving conclusions. The drawback with this method is that it does not necessarily give the correct results. However, under certain conditions examplebased deduction can be used to obtain a correct and complete inference procedure. This is the case for Boolean formulae (reasoning with models) and for certain types of database integrity constraints (the use of Armstrong relations). We show that these approaches are closely related, and use the relationship to prove new results about the existence and sizes of Armstrong relations for Boolean dependencies. Furthermore, we exhibit close relations between the questions of finding keys in relational databases and that of finding abductive explanations. Further applications of the correspondence between these two approaches are also discussed. 1 Introduction One of the major tasks in database systems as well as artificial intelligence systems is to express some know...
A Multistrategy Approach to Relational Knowledge Discovery in Databases
 Machine Learning Journal
, 1996
"... . When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
. When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on realworld databases. The opposite extreme is to select a small data set, thereby being able to learn very expressive (firstorder logic) hypotheses. A multistrategy approach allows one to include most of these advantages and exclude most of the disadvantages. Simpler learning algorithms detect hierarchies which are used to structure the hypothesis space for a more complex learning algorithm. The better structured the hypothesis space is, the better learning can prune away uninteresting or losing hypotheses and the faster it becomes. We have combined inductive logic programming (ILP) directly with a relational database management system. The ILP algorithm is controlled in a modeldriven way by t...
Fast Computation of Concept Lattices Using Data Mining Techniques
, 2000
"... We present a new algorithm called Titanic for computing concept lattices. It is based on data mining techniques for computing frequent itemsets. The algorithm is experimentally evaluated and compared with B. Ganter's NextClosure algorithm. ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We present a new algorithm called Titanic for computing concept lattices. It is based on data mining techniques for computing frequent itemsets. The algorithm is experimentally evaluated and compared with B. Ganter's NextClosure algorithm.