Results 1  10
of
12
Efficiently mining long patterns from databases
, 1998
"... We present a patternmining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data ..."
Abstract

Cited by 446 (3 self)
 Add to MetaCart
(Show Context)
We present a patternmining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data show that when the patterns are long, our algorithm is more efficient by an order of magnimaximal frequent itemset, MaxMiner’s output implicitly and concisely represents all frequent itemsets. MaxMiner is shown to result in two or more orders of magnitude in performance improvements over Apriori on some datasets. On other datasets where the patterns are not so long, the gains are more modest. In practice, MaxMiner is demonstrated to run in time that is roughly linear in the number of maximal frequent itemsets and the size of the database, irrespective of the size of the longest frequent itemset. tude or more. 1.
A Continuous Approach to Inductive Inference
 Mathematical Programming
, 1992
"... In this paper we describe an interior point mathematical programming approach to inductive inference. We list several versions of this problem and study in detail the formulation based on hidden Boolean logic. We consider the problem of identifying a hidden Boolean function F : f0; 1g n ! f0; 1g ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
(Show Context)
In this paper we describe an interior point mathematical programming approach to inductive inference. We list several versions of this problem and study in detail the formulation based on hidden Boolean logic. We consider the problem of identifying a hidden Boolean function F : f0; 1g n ! f0; 1g using outputs obtained by applying a limited number of random inputs to the hidden function. Given this inputoutput sample, we give a method to synthesize a Boolean function that describes the sample. We pose the Boolean Function Synthesis Problem as a particular type of Satisfiability Problem. The Satisfiability Problem is translated into an integer programming feasibility problem, that is solved with an interior point algorithm for integer programming. A similar integer programming implementation has been used in a previous study to solve randomly generated instances of the Satisfiability Problem. In this paper we introduce a new variant of this algorithm, where the Riemannian metric used...
Compilation for Critically Constrained Knowledge Bases
 In Proc. of the 13 th National Conference on Artificial Intelligence (AAAI’96
, 1996
"... We show that many "critically constrained" Random 3SAT knowledge bases (KBs) can be compiled into disjunctive normal form easily by using a variant of the "DavisPutnam" proof procedure. From these compiled KBs we can answer all queries about entailment of conjunctive normal form ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
We show that many "critically constrained" Random 3SAT knowledge bases (KBs) can be compiled into disjunctive normal form easily by using a variant of the "DavisPutnam" proof procedure. From these compiled KBs we can answer all queries about entailment of conjunctive normal formulas, also easily  compared to a "bruteforce " approach to approximate knowledge compilation into unit clauses for the same KBs. We exploit this fact to develop an aggressive hybrid approach which attempts to compile a KB exactly until a given resource limit is reached, then falls back to approximate compilation into unit clauses. The resulting approach handles all of the critically constrained Random 3SAT KBs with average savings of an order of magnitude over the bruteforce approach. Introduction Consider the task of reasoning from a propositional knowledge base (KB) F which is expressed as a conjunctive normal formula (CNF). We are given other, query CNFs Q 1 ; Q 2 ; : : : ; QN and asked, for each Q i ,...
An SEtreebased Prime Implicant Generation Algorithm
 IEEE Trans
, 1994
"... Prime implicants/implicates (PIs) have been shown to be a useful tool in several problem domains. In ModelBased Diagnosis (MBD), [de Kleer et al. 90] have used PIs to characterize diagnoses. We present a PI generation algorithm which, although based on the general SEtreebased search framework, is ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Prime implicants/implicates (PIs) have been shown to be a useful tool in several problem domains. In ModelBased Diagnosis (MBD), [de Kleer et al. 90] have used PIs to characterize diagnoses. We present a PI generation algorithm which, although based on the general SEtreebased search framework, is effectively an improvement of a particular PI generation algorithm proposed by [Slagle et al. 70]. The improvement is achieved via a decomposition tactic which is boosted by the SEtreebased framework. The new algorithm is also more flexible in a number of ways. We present empirical results comparing the new algorithm to the old one, as well as to current PI generation algorithms. 1 Introduction Prime implicates/implicants (PIs) were a topic of great interest to researchers in the early days of computer science, in part because of their use in procedures for boolean function minimization [Quine 52]. A number of algorithms were developed, including [Quine 52], [Karnaugh 53], [McCluskey 56]...
Composite Distributive Lattices as Annotation Domains for Mediators
 Proc. of AISC'2000
, 2000
"... In a mediator system based on annotated logics it is a suitable requirement to allow annotations from different lattices in one program on a perpredicate basis. These lattices however may be related through common sublattices, hence demanding predicates which are able to carry combinations of a ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In a mediator system based on annotated logics it is a suitable requirement to allow annotations from different lattices in one program on a perpredicate basis. These lattices however may be related through common sublattices, hence demanding predicates which are able to carry combinations of annotations, or access to components of annotations.
BOOM  a Boolean Minimizer
, 2001
"... This report presents an algorithm for twolevel Boolean minimization (BOOM) based on a new implicant generation paradigm. In contrast to all previous minimization methods, where the implicants are generated bottomup, the proposed approach uses a topdown approach. Thus instead of increasing the dim ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This report presents an algorithm for twolevel Boolean minimization (BOOM) based on a new implicant generation paradigm. In contrast to all previous minimization methods, where the implicants are generated bottomup, the proposed approach uses a topdown approach. Thus instead of increasing the dimensionality of implicants by omitting literals from their terms, the dimension of a term is gradually decreased by adding new literals. One of the drawbacks of the classical approach to prime implicant generation, dating back to the original QuineMcCluskey method, is the use of terms (be it minterms or terms of higher dimension) found in the definition of the function to be minimized, as a basis for the solution. Thus the choice of terms used originally for covering the function may influence the final solution. In the proposed method, the original coverage influences the final solution only indirectly, through the number of literals used. Starting from an ndimensional hypercube (where n is the number of input variables), new terms are generated, whereas only the onset and offset are consulted. Thus the original choice of the implicant terms is of a small importance. Most minimization methods use two basic phases introduced by QuineMcCluskey, known as prime implicant generation and the covering problem solution. Some more modern methods, including the wellknown ESPRESSO, combine these two phases, reducing the number of implicants to be processed. A sort of combination of prime implicant generation with the solution of the covering problem is also used in the BOOM approach proposed here, because the search for new literals to be included into a term aims at maximum coverage of the output function (coveragedirected search). The implicants generated during the CDsearch are then expanded to become primes. Different heuristics are used during the CDsearch and when solving the covering problem. The function to be minimized is defined by its onset and offset, listed in a truth table. Thus the don't care set, which normally represents the dominant part of the truth table, need not be specified explicitly. The proposed minimization method is efficient above all for functions with several hundreds of input variables and with a large portion of don't care states. The minimization method has been tested on several different kinds of problems. The MCNC standard benchmarks were solved several times in order to evaluate the minimality of the solution and the runtime. Both "easy" and "hard" MCNC benchmarks were solved and compared with the solutions obtained by ESPRESSO. In many cases the time needed to find the minimum solution on an ordinary PC was nonmeasurable. The procedure is so fast that even for large problems with hundreds of input variables it often finds a solution in a fraction of a second. Hence if the first solution does not meet the requirements, it can be improved in an iterative manner. Larger problems (with more than 100 input variables and more than 100 terms with defined output values) were generated randomly and solved by BOOM and by ESPRESSO. BOOM was in this case up to 166 times faster. For problems with more than 300 input variables no comparison with any other minimization tool was possible, because no other system, including ESPRESSO, can solve such problems. The dimension of the problems solved by BOOM can easily be increased over 1000 input variables, because the runtime grows linearly with the number of inputs. On the other hand, as the runtime grows roughly with the square of the size of the care set, for problems of very high dimension the success largely depends on the number of care terms. The quality of the proposed method was also tested on other problems like graph coloring and symmetric function minimization.
METAPRIME, an Interactive Fault Tree Analyser
, 1994
"... This paper introduces an analysis method of coherent as well as noncoherent fault trees that overcomes this limitation because its computational cost is related to neither the number of basic events, nor the number of gates, nor the number of prime implicants of these trees. We present the concepts ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper introduces an analysis method of coherent as well as noncoherent fault trees that overcomes this limitation because its computational cost is related to neither the number of basic events, nor the number of gates, nor the number of prime implicants of these trees. We present the concepts underlying the prototype tool METAPRIME, and the experimental results obtained with this tool on real life fault trees. These results show that these concepts allow us to completely analyse in seconds fault trees that no previously available technique could ever partially analyse, for instance noncoherent fault trees with more than 10
An Algorithm for Induction of Possibilistic
"... We present a new algorithm, called Optimist, which generates possibilistic setvalued rules from tables containing categorical attributes taking a finite number of values. An example of such a rule might be "IF HOUSEHOLDSIZE={Two OR Tree} AND OCCUPATION={Professional OR Clerical} THEN PAYMENT_ME ..."
Abstract
 Add to MetaCart
We present a new algorithm, called Optimist, which generates possibilistic setvalued rules from tables containing categorical attributes taking a finite number of values. An example of such a rule might be "IF HOUSEHOLDSIZE={Two OR Tree} AND OCCUPATION={Professional OR Clerical} THEN PAYMENT_METHOD={CashCheck (Max=249) OR DebitCard (Max=175)}. The algorithm is based on an original formal framework generalising the conventional boolean approach in two directions: (i) finitevalued variables and (ii) continuosvalued semantics. Using this formalism we approximate the multidimensional distribution induced from data by a number of possibilistic prime disjunctions (patterns) representing the widest intervals of impossible combinations of values. The Optimist algorithm described in the paper generates the most interesting prime disjunctions for one pass through the data set by means of transformation from the DNF representing data into the possibilistic CNF representing knowledge. It consists of generation, absorption and filtration parts. The setvalued rules built from the possibilistic patterns are optimal in the sense that they have the most general condition and the most specific conclusion. For the case of finitevalued attributes and twovalued semantics the algorithm is implemented in the Chelovek rule induction system for Windows 95.
unknown title
"... AbstractWe present a patternmining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longestpattern length. Experiments on r ..."
Abstract
 Add to MetaCart
(Show Context)
AbstractWe present a patternmining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longestpattern length. Experiments on real data show that when the patterns are long, our algorithm is more efficient by an order of magnitude or more.
The Hows, Whys, and Whens of Constraints in
"... Abstract. Many researchers in our community (this author included) regularly emphasize the role constraints play in improving performance of datamining algorithms. This emphasis has led to remarkable progress current algorithms allow an incredibly rich and varied set of hidden patterns to be effi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Many researchers in our community (this author included) regularly emphasize the role constraints play in improving performance of datamining algorithms. This emphasis has led to remarkable progress current algorithms allow an incredibly rich and varied set of hidden patterns to be efficiently elicited from massive datasets, even under the burden of NPhard problem definitions and diskresident or distributed data. But this progress has come at a cost. In our singleminded drive towards maximum performance, we have often neglected and in fact hindered the important role of discovery in the knowledge discovery and datamining (KDD) process. In this paper, I propose various strategies for applying constraints within algorithms for itemset and rule mining