Results 1  10
of
91
Levelwise Search and Borders of Theories in Knowledge Discovery
, 1997
"... One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm fo ..."
Abstract

Cited by 212 (13 self)
 Add to MetaCart
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ` L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
Methods and Problems in Data Mining
, 1997
"... Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We a ..."
Abstract

Cited by 74 (2 self)
 Add to MetaCart
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.
Data mining, hypergraph transversals, and machine learning
, 1997
"... Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with the hypergraph transversal problem. We then analyze two algorithms that have been previously used in da ..."
Abstract

Cited by 65 (5 self)
 Add to MetaCart
Several data mining problems can be formulated as problems of finding maximally specific sentences that are interesting in a database. We first show that this problem has a close relationship with the hypergraph transversal problem. We then analyze two algorithms that have been previously used in data mining, proving upper bounds on their complexity. The first algorithm is useful when the maximally specific interesting sentences are "small". We show that this algorithm can also be used to efficiently solve a special case of the hypergraph transversal problem, improving on previous results. The second algorithm utilizes a subroutine for hypergraph transversals, and is applicable in more general situations, with complexity close to a lower bound for the problem. We also relate these problems to the model of exact learning in computational learning theory, and use the correspondence to derive some corollaries. 1
Discovering All Most Specific Sentences
 ACM Transactions on Database Systems
, 2003
"... this article, we show how the problems of finding frequent sets in relations and of finding minimal keys in databases can be reduced to this formulation. Using this theory extraction formulation [Mannila 1995, 1996; Mannila and Toivonen 1997], one can formulate general results about the complexity o ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
this article, we show how the problems of finding frequent sets in relations and of finding minimal keys in databases can be reduced to this formulation. Using this theory extraction formulation [Mannila 1995, 1996; Mannila and Toivonen 1997], one can formulate general results about the complexity of algorithms for these data mining tasks
Discovering all Most Specific Sentences by Randomized Algorithms (Extended Abstract)
 In Intl. Conf. on Database Theory
, 1997
"... Dimitrios Gunopulos 1 and Heikki Mannila 2 and Sanjeev Saluja 3 1 MaxPlanckInsitut Informatik, Im Stadtwald, 66123 Saarbrucken, Germany. gunopulo@mpisb.mpg.de 2 University of Helsinki, Dept. of Computer Science, FIN00014 Helsinki, Finland. Heikki.Mannila@cs.helsinki.fi. Work supported by ..."
Abstract

Cited by 55 (5 self)
 Add to MetaCart
Dimitrios Gunopulos 1 and Heikki Mannila 2 and Sanjeev Saluja 3 1 MaxPlanckInsitut Informatik, Im Stadtwald, 66123 Saarbrucken, Germany. gunopulo@mpisb.mpg.de 2 University of Helsinki, Dept. of Computer Science, FIN00014 Helsinki, Finland. Heikki.Mannila@cs.helsinki.fi. Work supported by Alexander von HumboldStiftung and the Academy of Finland. 3 MaxPlanckInstitut Informatik, Im Stadtwald, 66123 Saarbrucken, Germany. saluja@mpisb.mpg.de Abstract. Data mining can in many instances be viewed as the task of computing a representation of a theory of a model or a database. In this paper we present a randomized algorithm that can be used to compute the representation of a theory in terms of the most specific sentences of that theory. In addition to randomization, the algorithm uses a generalization of the concept of hypergraph transversal. We apply the general algorithm, for discovering maximal frequent sets in 0/1 data, and for computing minimal keys in relations. We prese...
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets
, 2002
"... Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such addit ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such additional maximal tfrequent column sets may be exponentially large. In contrast, all minimal tinfrequent sets of columns of A can be enumerated in incremental quasipolynomial time. The proof of the latter result follows from the inequality # t + 1)#, where # and # are respectively the numbers of all maximal tfrequent and all minimal tinfrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed tfrequent column sets for a given binary matrix.
How Hard is it to Revise a Belief Base?
, 1996
"... If a new piece of information contradicts our previously held beliefs, we have to revise our beliefs. This problem of belief revision arises in a number of areas in Computer Science and Artificial Intelligence, e.g., in updating logical database, in hypothetical reasoning, and in machine learning. M ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
If a new piece of information contradicts our previously held beliefs, we have to revise our beliefs. This problem of belief revision arises in a number of areas in Computer Science and Artificial Intelligence, e.g., in updating logical database, in hypothetical reasoning, and in machine learning. Most of the research in this area is influenced by work in philosophical logic, in particular by Gardenfors and his colleagues, who developed the theory of belief revision. Here we will focus on the computational aspects of this theory, surveying results that address the issue of the computational complexity of belief revision.
New Results on Monotone Dualization and Generating Hypergraph Transversals
 SIAM JOURNAL ON COMPUTING
, 2002
"... We consider the problem of dualizing a monotone CNF (equivalently, computing all minimal transversals of a hypergraph), whose associated decision problem is a prominent open problem in NPcompleteness. We present a number of new polynomial time resp. outputpolynomial time results for significant ..."
Abstract

Cited by 37 (12 self)
 Add to MetaCart
We consider the problem of dualizing a monotone CNF (equivalently, computing all minimal transversals of a hypergraph), whose associated decision problem is a prominent open problem in NPcompleteness. We present a number of new polynomial time resp. outputpolynomial time results for significant cases, which largely advance the tractability frontier and improve on previous results. Furthermore, we show that duality of two monotone CNFs can be disproved with limited nondeterminism. More precisely, this is feasible in polynomial time with O(logĀ² n/log log n) suitably guessed bits. This result sheds new light on the complexity of this important problem.
On Horn Envelopes and Hypergraph Transversals (Extended Abstract)
, 1993
"... for ISAAC) Dimitris Kavvadias 1;4 Christos H. Papadimitriou 2;4;5 Martha Sideri 3;4 Abstract: We study the problem of bounding from above and below a given set of bit vectors by the set of satisfying truth assignments of a Horn formula. We point out a rather unexpected connection between the ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
for ISAAC) Dimitris Kavvadias 1;4 Christos H. Papadimitriou 2;4;5 Martha Sideri 3;4 Abstract: We study the problem of bounding from above and below a given set of bit vectors by the set of satisfying truth assignments of a Horn formula. We point out a rather unexpected connection between the upper bounding problem and the problem of generating all transversals of a hypergraph, and settle several related complexity questions. 1. INTRODUCTION Recently there has been much interest in the model theory of Boolean Logic, that is, the relationship between a Boolean formula and the corresponding set of models (in this paper by model we shall mean "satisfying truth assignment"). There are at least three distinct motivations for this problem: Identifying plausible Boolean formulae that describe a set of 0 \Gamma 1 vectors is one form of "discovering structure" in raw data [DP, KKS1]. Besides, simplifications and approximations of a Boolean formula via its models may represent a plausibl...
On an algorithm for finding all interesting sentences (Extended Abstract)
 In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research
, 1996
"... Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predic ..."
Abstract

Cited by 31 (9 self)
 Add to MetaCart
Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predicate, find all sentences of L deemed interesting by the interestingness predicate. In this paper we analyze a simple and wellknown levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. We also consider the verification problem of a KDD process: given r and a set of sentences T ` L, determine whether T is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.