Results 1  10
of
10
Using general impressions to analyze discovered classification rules
 Proc. 3rd Intl. Conf. on Knowledge Discovery & Data Mining (KDD97
, 1997
"... One of the important problems in data mining is the evaluation of subjective interestingness of the discovered rules. Past research has found that in many reallife applications it is easy to generate a large number of rules from the database, but most of the rules are not useful or interesting to t ..."
Abstract

Cited by 85 (13 self)
 Add to MetaCart
One of the important problems in data mining is the evaluation of subjective interestingness of the discovered rules. Past research has found that in many reallife applications it is easy to generate a large number of rules from the database, but most of the rules are not useful or interesting to the user. Due to the large number of rules, it is difficult for the user to analyze them manually in order to identify those interesting ones. Whether a rule is of interest to a user depends on his/her existing knowledge of the domain, and his/her interests. In this paper, we propose a technique that analyzes the discovered rules against a specific type of existing knowledge, which we call general impressions, to help the user identify interesting rules. We first propose a representation language to allow general impressions to be specified. We then present some algorithms to analyze the discovered classification rules against a set of general impressions. The results of the analysis tell us which rules conform to the general impressions and which rules are unexpected. Unexpected rules are by definition interesting. 1.
Fast mining of high dimensional expressive contrast patterns using zerosuppressed binary decision diagrams
 In KDD
, 2006
"... Patterns of contrast are a very important way of comparing multidimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challeng ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Patterns of contrast are a very important way of comparing multidimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challenging when the number of dimensions is large. This paper describes a new technique for mining several varieties of contrast pattern, based on the use of ZeroSuppressed Binary Decision Diagrams (ZBDDs), a powerful data structure for manipulating sparse data. We study the mining of both simple contrast patterns, such as emerging patterns, and more novel and complex contrasts, which we call disjunctive emerging patterns. A performance study demonstrates our ZBDD technique is highly scalable, substantially improves on state of the art mining for emerging patterns and can be effective for discovering complex contrasts from datasets with thousands of attributes.
Geometric and Combinatorial Tiles in 01 Data
 In: Proceedings PKDD’04. Volume 3202 of LNAI
, 2004
"... In this paper we introduce a simple probabilistic model, hierarchical tiles, for 01 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X x Y. A hierarchical tile ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
In this paper we introduce a simple probabilistic model, hierarchical tiles, for 01 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X x Y. A hierarchical tile has additionally a set of exception tiles that specify the probabilities for subrectangles of the original rectangle. If the rows and columns are ordered and X and Y consist of consecutive elements in those orderings, then the tile is geometric; otherwise it is combinatorial. We give a simple randomized algorithm for finding good geometric tiles. Our main result shows that using spectral ordering techniques one can find good orderings that turn combinatorial tiles into geometric tiles. We give empirical results on the performance of the methods.
An efficient implementation of a quasipolynomial algorithm for generating hypergraph transversals
 Proceedings 11th Annual European Symposium on Algorithms (ESA 2003
, 2003
"... Given a finite set V, and a hypergraph H ⊆ 2 V, the hypergraph transversal problem calls for enumerating all minimal hitting sets (transversals) for H. This problem plays an important role in practical applications as many other problems were shown to be polynomially equivalent to it. Fredman and Kh ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Given a finite set V, and a hypergraph H ⊆ 2 V, the hypergraph transversal problem calls for enumerating all minimal hitting sets (transversals) for H. This problem plays an important role in practical applications as many other problems were shown to be polynomially equivalent to it. Fredman and Khachiyan (1996) gave an incremental quasipolynomial time algorithm for solving the hypergraph transversal problem. In this paper, we present an efficient implementation of this algorithm. While we show that our implementation achieves the same theoretical worst case bound, practical experience with this implementation shows that it can be substantially faster. We also show that a slight modification of the original algorithm can be used to give a stronger bound on the running time. More generally, we consider a monotone property π over a bounded ndimensional integral box. As an important application of the above hypergraph transversal problem, pioneered by Bioch and Ibaraki (1995), we consider the problems of incrementally generating simultaneously both families of all minimal subsets satisfying π and all maximal subsets not satisfying π, for properties given by a polynomialtime satisfiability oracle. Problems of this type arise in many practical applications. It is known that the above joint generation problem can be solved in incremental quasipolynomial time performing a polynomialtime reduction to a generalization of the hypergraph transversal problem on integer boxes. In this paper we present an efficient implementation of this procedure, and present experimental results to evaluate our implementation for a number of interesting monotone properties π. 1
An intersection inequality for discrete distributions and related generation problems
 in Automata, Languages and Programming, 30th International Colloquium, ICALP 2003, Lecture Notes in Computer Science (LNCS) 2719
, 2003
"... Abstract. Given two finite sets of points X, Y in R n which can be separated by a nonnegative linear function, and such that the componentwise minimum of any two distinct points in X is dominated by some point in Y, we show that X  ≤ nY. As a consequence of this result, we obtain quasipolynomi ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. Given two finite sets of points X, Y in R n which can be separated by a nonnegative linear function, and such that the componentwise minimum of any two distinct points in X is dominated by some point in Y, we show that X  ≤ nY. As a consequence of this result, we obtain quasipolynomial time algorithms for generating all maximal integer feasible solutions for a given monotone system of separable inequalities, for generating all pinefficient points of a given discrete probability distribution, and for generating all maximal empty hyperrectangles for a given set of points in R n. This provides a substantial improvement over previously known exponential algorithms for these generation problems related to Integer and Stochastic Programming, and Data Mining. Furthermore, we give an incremental polynomial time generation algorithm for monotone systems with fixed number of separable inequalities, which, for the very special case of one inequality, implies that for discrete probability distributions with independent coordinates, both pefficient and pinefficient points can be separately generated in incremental polynomial time. 1
N.: Complexityguided case discovery for case based reasoning
 In Proceedings of the 20th National Conference on Artificial Intelligence
, 2005
"... The distribution of cases in the case base is critical to the performance of a Case Based Reasoning system. The case author is given little support in the positioning of new cases during the development stage of a case base. In this paper we argue that classification boundaries represent important r ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The distribution of cases in the case base is critical to the performance of a Case Based Reasoning system. The case author is given little support in the positioning of new cases during the development stage of a case base. In this paper we argue that classification boundaries represent important regions of the problem space. They are used to identify locations where new cases should be acquired. We introduce two complexityguided algorithms which use a local complexity measure and boundary identification techniques to actively discover cases close to boundaries. The ability of these algorithms to discover new cases that significantly improve the accuracy of case bases is demonstrated on five public domain classification datasets.
Discovering Large Empty Maximal HyperRectangle in MultiDimensional Space
, 1997
"... Given a collection of points in a multidimensional space, we consider the problem of finding the set of all possible Maximal HyperRectangle (MHR), defined to be hyperrectangles that are empty and have at least a point bounding each of its surfaces. It is easy to see that there are enormous nu ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Given a collection of points in a multidimensional space, we consider the problem of finding the set of all possible Maximal HyperRectangle (MHR), defined to be hyperrectangles that are empty and have at least a point bounding each of its surfaces. It is easy to see that there are enormous number of such MHRs in a given instance, and most of the time, applications require only to find the "largest" MHR or"sufficiently large" MHRs. Our proposed algorithm solved all the above problems by setting a criterion to measure sufficiently large MHRs so that only those large MHRs will be reported. The algorithm runs much faster when the criterion set is "reasonably tight" as pruning is done naturally in the algorithm. 1 Introduction Problems occurring in multidimensional space have not been studied as extensively as those for one or twodimensional (1D and 2D) counterparts. One such example is the problem of finding the largest empty rectangle in a 2D space containing points o...
Algorithms for Dualization over Products of Partially Ordered Sets
"... Let P = P1× · · ·×Pn be the product of n partially ordered sets. Given a subset A ⊆ P, we consider problem DUAL(P, A, B) of extending a given partial list B of maximal independent elements of A in P. We give quasipolynomial time algorithms for solving problem DUAL(P, A, B) when each poset Pi belon ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Let P = P1× · · ·×Pn be the product of n partially ordered sets. Given a subset A ⊆ P, we consider problem DUAL(P, A, B) of extending a given partial list B of maximal independent elements of A in P. We give quasipolynomial time algorithms for solving problem DUAL(P, A, B) when each poset Pi belongs to one of the following classes: (i) semilattices of bounded width, (ii) forests, that is, posets with acyclic underlying graphs, with either bounded indegrees or outdegrees, or (iii) lattices defined by a set of real closed intervals.
Improving GA Search Reliability Using Maximal HyperRectangle Analysis
"... In Genetic algorithms it is not easy to evaluate the confidence level in whether a GA run may have missed a complete area of good points, and whether the global optimum was found. We accept this but hope to add some degree of confidence in our results by showing that no large gaps were left unvisite ..."
Abstract
 Add to MetaCart
In Genetic algorithms it is not easy to evaluate the confidence level in whether a GA run may have missed a complete area of good points, and whether the global optimum was found. We accept this but hope to add some degree of confidence in our results by showing that no large gaps were left unvisited in the search space. This can be achieved to some extent by inserting new individuals in big empty spaces. However it is not easy to find the biggest empty spaces, particularly in multidimensional problems. For a GA problem, however, it is not necessary to find the exact biggest empty spaces; a sufficiently large empty space is good enough to insert new individuals. In this paper, we present a method to find a sufficiently large empty HyperRectangle for new individual insertion in a GA while keeping the computational complexity as a polynomial function. Its merit is demonstrated in several domains.