Results 1 -
9 of
9
Using General Impressions to Analyze Discovered Classification Rules
, 1997
"... One of the important problems in data mining is the evaluation of subjective interestingness of the discovered rules. Past research has found that in many real-life applications it is easy to generate a large number of rules from the database, but most of the rules are not useful or interesting to t ..."
Abstract
-
Cited by 79 (13 self)
- Add to MetaCart
One of the important problems in data mining is the evaluation of subjective interestingness of the discovered rules. Past research has found that in many real-life applications it is easy to generate a large number of rules from the database, but most of the rules are not useful or interesting to the user. Due to the large number of rules, it is difficult for the user to analyze them manually in order to identify those interesting ones. Whether a rule is of interest to a user depends on his/her existing knowledge of the domain, and his/her interests. In this paper, we propose a technique that analyzes the discovered rules against a specific type of existing knowledge, which we call general impressions, to help the user identify interesting rules. We first propose a representation language to allow general impressions to be specified. We then present some algorithms to analyze the discovered classification rules against a set of general impressions. The results of the analysis tell us ...
Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams
- In KDD
, 2006
"... Patterns of contrast are a very important way of comparing multidimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challeng ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Patterns of contrast are a very important way of comparing multidimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challenging when the number of dimensions is large. This paper describes a new technique for mining several varieties of contrast pattern, based on the use of Zero-Suppressed Binary Decision Diagrams (ZBDDs), a powerful data structure for manipulating sparse data. We study the mining of both simple contrast patterns, such as emerging patterns, and more novel and complex contrasts, which we call disjunctive emerging patterns. A performance study demonstrates our ZBDD technique is highly scalable, substantially improves on state of the art mining for emerging patterns and can be effective for discovering complex contrasts from datasets with thousands of attributes.
Geometric and Combinatorial Tiles in 0-1 Data
- In: Proceedings PKDD’04. Volume 3202 of LNAI
, 2004
"... In this paper we introduce a simple probabilistic model, hierarchical tiles, for 0-1 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X x Y. A hierarchical tile ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we introduce a simple probabilistic model, hierarchical tiles, for 0-1 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X x Y. A hierarchical tile has additionally a set of exception tiles that specify the probabilities for subrectangles of the original rectangle. If the rows and columns are ordered and X and Y consist of consecutive elements in those orderings, then the tile is geometric; otherwise it is combinatorial. We give a simple randomized algorithm for finding good geometric tiles. Our main result shows that using spectral ordering techniques one can find good orderings that turn combinatorial tiles into geometric tiles. We give empirical results on the performance of the methods.
An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals
- Proceedings 11th Annual European Symposium on Algorithms (ESA 2003
, 2003
"... Given a finite set V, and a hypergraph H ⊆ 2 V, the hypergraph transversal problem calls for enumerating all minimal hitting sets (transversals) for H. This problem plays an important role in practical applications as many other problems were shown to be polynomially equivalent to it. Fredman and Kh ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Given a finite set V, and a hypergraph H ⊆ 2 V, the hypergraph transversal problem calls for enumerating all minimal hitting sets (transversals) for H. This problem plays an important role in practical applications as many other problems were shown to be polynomially equivalent to it. Fredman and Khachiyan (1996) gave an incremental quasi-polynomial time algorithm for solving the hypergraph transversal problem. In this paper, we present an efficient implementation of this algorithm. While we show that our implementation achieves the same theoretical worst case bound, practical experience with this implementation shows that it can be substantially faster. We also show that a slight modification of the original algorithm can be used to give a stronger bound on the running time. More generally, we consider a monotone property π over a bounded n-dimensional integral box. As an important application of the above hypergraph transversal problem, pioneered by Bioch and Ibaraki (1995), we consider the problems of incrementally generating simultaneously both families of all minimal subsets satisfying π and all maximal subsets not satisfying π, for properties given by a polynomial-time satisfiability oracle. Problems of this type arise in many practical applications. It is known that the above joint generation problem can be solved in incremental quasi-polynomial time performing a polynomialtime reduction to a generalization of the hypergraph transversal problem on integer boxes. In this paper we present an efficient implementation of this procedure, and present experimental results to evaluate our implementation for a number of interesting monotone properties π. 1
An intersection inequality for discrete distributions and related generation problems
- in Automata, Languages and Programming, 30-th International Colloquium, ICALP 2003, Lecture Notes in Computer Science (LNCS) 2719
, 2003
"... Abstract. Given two finite sets of points X, Y in R n which can be separated by a nonnegative linear function, and such that the componentwise minimum of any two distinct points in X is dominated by some point in Y, we show that |X | ≤ n|Y|. As a consequence of this result, we obtain quasi-polynomi ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract. Given two finite sets of points X, Y in R n which can be separated by a nonnegative linear function, and such that the componentwise minimum of any two distinct points in X is dominated by some point in Y, we show that |X | ≤ n|Y|. As a consequence of this result, we obtain quasi-polynomial time algorithms for generating all maximal integer feasible solutions for a given monotone system of separable inequalities, for generating all p-inefficient points of a given discrete probability distribution, and for generating all maximal empty hyper-rectangles for a given set of points in R n. This provides a substantial improvement over previously known exponential algorithms for these generation problems related to Integer and Stochastic Programming, and Data Mining. Furthermore, we give an incremental polynomial time generation algorithm for monotone systems with fixed number of separable inequalities, which, for the very special case of one inequality, implies that for discrete probability distributions with independent coordinates, both p-efficient and p-inefficient points can be separately generated in incremental polynomial time. 1
Discovering Large Empty Maximal Hyper-Rectangle in Multi-Dimensional Space
, 1997
"... Given a collection of points in a multi-dimensional space, we consider the problem of finding the set of all possible Maximal Hyper-Rectangle (MHR), defined to be hyper-rectangles that are empty and have at least a point bounding each of its surfaces. It is easy to see that there are enormous nu ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Given a collection of points in a multi-dimensional space, we consider the problem of finding the set of all possible Maximal Hyper-Rectangle (MHR), defined to be hyper-rectangles that are empty and have at least a point bounding each of its surfaces. It is easy to see that there are enormous number of such MHRs in a given instance, and most of the time, applications require only to find the "largest" MHR or"sufficiently large" MHRs. Our proposed algorithm solved all the above problems by setting a criterion to measure sufficiently large MHRs so that only those large MHRs will be reported. The algorithm runs much faster when the criterion set is "reasonably tight" as pruning is done naturally in the algorithm. 1 Introduction Problems occurring in multi-dimensional space have not been studied as extensively as those for one- or two-dimensional (1-D and 2-D) counterparts. One such example is the problem of finding the largest empty rectangle in a 2-D space containing points o...
N.: Complexity-guided case discovery for case based reasoning
- In Proceedings of the 20th National Conference on Artificial Intelligence
, 2005
"... The distribution of cases in the case base is critical to the performance of a Case Based Reasoning system. The case author is given little support in the positioning of new cases during the development stage of a case base. In this paper we argue that classification boundaries represent important r ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The distribution of cases in the case base is critical to the performance of a Case Based Reasoning system. The case author is given little support in the positioning of new cases during the development stage of a case base. In this paper we argue that classification boundaries represent important regions of the problem space. They are used to identify locations where new cases should be acquired. We introduce two complexityguided algorithms which use a local complexity measure and boundary identification techniques to actively discover cases close to boundaries. The ability of these algorithms to discover new cases that significantly improve the accuracy of case bases is demonstrated on five public domain classification datasets.
Algorithms for Dualization over Products of Partially Ordered Sets
"... Let P = P1× · · ·×Pn be the product of n partially ordered sets. Given a subset A ⊆ P, we consider problem DUAL(P, A, B) of extending a given partial list B of maximal independent elements of A in P. We give quasi-polynomial time algorithms for solving problem DUAL(P, A, B) when each poset Pi belon ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Let P = P1× · · ·×Pn be the product of n partially ordered sets. Given a subset A ⊆ P, we consider problem DUAL(P, A, B) of extending a given partial list B of maximal independent elements of A in P. We give quasi-polynomial time algorithms for solving problem DUAL(P, A, B) when each poset Pi belongs to one of the following classes: (i) semi-lattices of bounded width, (ii) forests, that is, posets with acyclic underlying graphs, with either bounded in-degrees or out-degrees, or (iii) lattices defined by a set of real closed intervals.

