Results 1 - 10
of
63
Levelwise Search and Borders of Theories in Knowledge Discovery
, 1997
"... One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm fo ..."
Abstract
-
Cited by 177 (12 self)
- Add to MetaCart
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ` L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
Minimal-Change Integrity Maintenance Using Tuple Deletions
- Information and Computation
, 2005
"... We address the problem of minimal-change integrity maintenance in the context of integrity constraints in relational databases. We assume that integrity-restoration actions are limited to tuple deletions. We focus on two basic computational issues: repair checking (is a database instance a repair of ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
We address the problem of minimal-change integrity maintenance in the context of integrity constraints in relational databases. We assume that integrity-restoration actions are limited to tuple deletions. We focus on two basic computational issues: repair checking (is a database instance a repair of a given database?) and consistent query answers [3] (is a tuple an answer to a given query in every repair of a given database?). We study the computational complexity of both problems, delineating the boundary between the tractable and the intractable cases. We consider denial constraints, general functional and inclusion dependencies, as well as key and foreign key constraints. Our results shed light on the computational feasibility of minimal-change integrity maintenance. The tractable cases should lead to practical implementations. The intractability results highlight the inherent limitations of any integrity enforcement mechanism, e.g., triggers or referential constraint actions, as a way of performing minimal-change integrity maintenance. 1
Methods and Problems in Data Mining
, 1997
"... Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We a ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.
Data mining, hypergraph transversals, and machine learning
, 1997
"... Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with the hypergraph transversal problem. We then analyze two algorithms that have been previously used in da ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with the hypergraph transversal problem. We then analyze two algorithms that have been previously used in data mining, proving upper bounds on their complexity. The first algorithm is useful when the maximally specific interesting sentences are "small". We show that this algorithm can also be used to efficiently solve a special case of the hypergraph transversal problem, improving on previous results. The second algorithm utilizes a subroutine for hypergraph transversals, and is applicable in more general situations, with complexity close to a lower bound for the problem. We also relate these problems to the model of exact learning in computational learning theory, and use the correspondence to derive some corollaries. 1
TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies
, 1999
"... this paper, we also consider the approximate dependency inference task: given a relation r and a threshold #, find all minimal non-trivial approximate dependencies ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
this paper, we also consider the approximate dependency inference task: given a relation r and a threshold #, find all minimal non-trivial approximate dependencies
Efficient Discovery of Functional and Approximate Dependencies Using Partitions (Extended version)
- In ICDE
, 1997
"... Discovery of functional dependencies from relations has been identified as an important database analysis technique. In this paper, we present a new approach for finding functional dependencies from large databases, based on partitioning the set of rows with respect to their attribute values. The us ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Discovery of functional dependencies from relations has been identified as an important database analysis technique. In this paper, we present a new approach for finding functional dependencies from large databases, based on partitioning the set of rows with respect to their attribute values. The use of partitions makes the discovery of approximate functional dependencies easy and efficient, and the erroneous or exceptional rows can be identified easily. Experiments show that the new algorithm is efficient in practice. For benchmark databases the running times are improved by several orders of magnitude over previously published results. The algorithm is also applicable to much larger datasets than the previous methods. Computing Reviews Categories and Subject Descriptors: H.3.1 Content Analysis and Indexing F.2.2 Nonnumerical Algorithms and Problems I.2.6 Learning General Terms: Algorithms, Experimentation Additional Key Words and Phrases: Knowledge Discovery, Data Mining, Func...
Discovering All Most Specific Sentences
- ACM Transactions on Database Systems
, 2003
"... this article, we show how the problems of finding frequent sets in relations and of finding minimal keys in databases can be reduced to this formulation. Using this theory extraction formulation [Mannila 1995, 1996; Mannila and Toivonen 1997], one can formulate general results about the complexity o ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
this article, we show how the problems of finding frequent sets in relations and of finding minimal keys in databases can be reduced to this formulation. Using this theory extraction formulation [Mannila 1995, 1996; Mannila and Toivonen 1997], one can formulate general results about the complexity of algorithms for these data mining tasks
On an algorithm for finding all interesting sentences (Extended Abstract)
- In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research
, 1996
"... Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predic ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predicate, find all sentences of L deemed interesting by the interestingness predicate. In this paper we analyze a simple and well-known levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. We also consider the verification problem of a KDD process: given r and a set of sentences T ` L, determine whether T is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
Data Mining: Machine Learning, Statistics, and Databases
- In Proceedings of the 8th International Conference on Scientific and Statistical Database Management
, 1996
"... Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We give an overview of the area and present some of the research issues, especially from the database angle. 1 Introduction Knowledge discovery in databases (KDD), often called data mi ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We give an overview of the area and present some of the research issues, especially from the database angle. 1 Introduction Knowledge discovery in databases (KDD), often called data mining, aims at the discovery of useful information from large collections of data. The discovered knowledge can be rules describing properties of the data, frequently occurring patterns, clusterings of the objects in the database, etc. Data mining has in the 1990's emerged as visible research and development area; both in industry and in science there seems to be a lack of methods for efficient analysis of large data sets. Current technology makes it fairly easy to collect data, but data analysis tends to be slow and expensive. There is a suspicion that there might be nuggets of useful information hiding in the masses of unanalyzed or underanalyzed data, and therefore semiautomatic methods fo...
Achievements of relational database schema design theory revisited
- Semantics in Databases, volume LCNS 1358
, 1998
"... Database schema design is seen as to decide on formats for time-varying instances, on rules for supporting inferences and on semantic constraints. Schema design aims at both faithful formalization of the application and optimization at design time. It is guided by four heuristics: Separation of Asp ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Database schema design is seen as to decide on formats for time-varying instances, on rules for supporting inferences and on semantic constraints. Schema design aims at both faithful formalization of the application and optimization at design time. It is guided by four heuristics: Separation of Aspects, Separation of Specializations, Inferential Completeness and Unique Flavor. A theory of schema design is to investigate these heuristics and to provide insight into how syntactic properties of schemas are related to worthwhile semantic properties, how desirable syntactic properties can be decided or achieved algorithmically, and how the syntactic properties determine costs of storage, queries and updates. Some well-known achievements of design theory for relational databases are reviewed: normal forms, view support, deciding implications of semantic constraints, acyclicity, design algorithms removing forbidden substructures.

