Results 1  10
of
16
Using general impressions to analyze discovered classification rules
 Proc. 3rd Intl. Conf. on Knowledge Discovery & Data Mining (KDD97
, 1997
"... One of the important problems in data mining is the evaluation of subjective interestingness of the discovered rules. Past research has found that in many reallife applications it is easy to generate a large number of rules from the database, but most of the rules are not useful or interesting to t ..."
Abstract

Cited by 92 (13 self)
 Add to MetaCart
One of the important problems in data mining is the evaluation of subjective interestingness of the discovered rules. Past research has found that in many reallife applications it is easy to generate a large number of rules from the database, but most of the rules are not useful or interesting to the user. Due to the large number of rules, it is difficult for the user to analyze them manually in order to identify those interesting ones. Whether a rule is of interest to a user depends on his/her existing knowledge of the domain, and his/her interests. In this paper, we propose a technique that analyzes the discovered rules against a specific type of existing knowledge, which we call general impressions, to help the user identify interesting rules. We first propose a representation language to allow general impressions to be specified. We then present some algorithms to analyze the discovered classification rules against a set of general impressions. The results of the analysis tell us which rules conform to the general impressions and which rules are unexpected. Unexpected rules are by definition interesting. 1.
Fast mining of high dimensional expressive contrast patterns using zerosuppressed binary decision diagrams
 In KDD
, 2006
"... Patterns of contrast are a very important way of comparing multidimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challeng ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
Patterns of contrast are a very important way of comparing multidimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challenging when the number of dimensions is large. This paper describes a new technique for mining several varieties of contrast pattern, based on the use of ZeroSuppressed Binary Decision Diagrams (ZBDDs), a powerful data structure for manipulating sparse data. We study the mining of both simple contrast patterns, such as emerging patterns, and more novel and complex contrasts, which we call disjunctive emerging patterns. A performance study demonstrates our ZBDD technique is highly scalable, substantially improves on state of the art mining for emerging patterns and can be effective for discovering complex contrasts from datasets with thousands of attributes.
Geometric and Combinatorial Tiles in 01 Data
 In: Proceedings PKDD’04. Volume 3202 of LNAI
, 2004
"... In this paper we introduce a simple probabilistic model, hierarchical tiles, for 01 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X x Y. A hierarchical tile ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
In this paper we introduce a simple probabilistic model, hierarchical tiles, for 01 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X x Y. A hierarchical tile has additionally a set of exception tiles that specify the probabilities for subrectangles of the original rectangle. If the rows and columns are ordered and X and Y consist of consecutive elements in those orderings, then the tile is geometric; otherwise it is combinatorial. We give a simple randomized algorithm for finding good geometric tiles. Our main result shows that using spectral ordering techniques one can find good orderings that turn combinatorial tiles into geometric tiles. We give empirical results on the performance of the methods.
Mining for empty rectangles in large data sets
 In Proceedings of the 8th ICDT
, 2001
"... ..."
(Show Context)
An efficient implementation of a quasipolynomial algorithm for generating hypergraph transversals
 Proceedings 11th Annual European Symposium on Algorithms (ESA 2003
, 2003
"... Given a finite set V, and a hypergraph H ⊆ 2 V, the hypergraph transversal problem calls for enumerating all minimal hitting sets (transversals) for H. This problem plays an important role in practical applications as many other problems were shown to be polynomially equivalent to it. Fredman and Kh ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Given a finite set V, and a hypergraph H ⊆ 2 V, the hypergraph transversal problem calls for enumerating all minimal hitting sets (transversals) for H. This problem plays an important role in practical applications as many other problems were shown to be polynomially equivalent to it. Fredman and Khachiyan (1996) gave an incremental quasipolynomial time algorithm for solving the hypergraph transversal problem. In this paper, we present an efficient implementation of this algorithm. While we show that our implementation achieves the same theoretical worst case bound, practical experience with this implementation shows that it can be substantially faster. We also show that a slight modification of the original algorithm can be used to give a stronger bound on the running time. More generally, we consider a monotone property π over a bounded ndimensional integral box. As an important application of the above hypergraph transversal problem, pioneered by Bioch and Ibaraki (1995), we consider the problems of incrementally generating simultaneously both families of all minimal subsets satisfying π and all maximal subsets not satisfying π, for properties given by a polynomialtime satisfiability oracle. Problems of this type arise in many practical applications. It is known that the above joint generation problem can be solved in incremental quasipolynomial time performing a polynomialtime reduction to a generalization of the hypergraph transversal problem on integer boxes. In this paper we present an efficient implementation of this procedure, and present experimental results to evaluate our implementation for a number of interesting monotone properties π. 1
An intersection inequality for discrete distributions and related generation problems
, 2003
"... Abstract. Given two finite sets of points X,Y in Rn which can be separated by a nonnegative linear function, and such that the componentwise minimum of any two distinct points in X is dominated by some point in Y, we show that X  ≤ nY. As a consequence of this result, we obtain quasipolynomi ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Given two finite sets of points X,Y in Rn which can be separated by a nonnegative linear function, and such that the componentwise minimum of any two distinct points in X is dominated by some point in Y, we show that X  ≤ nY. As a consequence of this result, we obtain quasipolynomial time algorithms for generating all maximal integer feasible solutions for a given monotone system of separable inequalities, for generating all pinefficient points of a given discrete probability distribution, and for generating all maximal empty hyperrectangles for a given set of points in Rn. This provides a substantial improvement over previously known exponential algorithms for these generation problems related to Integer and Stochastic Programming, and Data Mining. Furthermore, we give an incremental polynomial time generation algorithm for monotone systems with fixed number of separable inequalities, which, for the very special case of one inequality, implies that for discrete probability distributions with independent coordinates, both pefficient and pinefficient points can be separately generated in incremental polynomial time. 1
Complexityguided case discovery for case based reasoning
 In: The Twentieth National Conference on Artificial Intelligence
, 2005
"... The distribution of cases in the case base is critical to the performance of a Case Based Reasoning system. The case author is given little support in the positioning of new cases during the development stage of a case base. In this paper we argue that classification boundaries represent important ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The distribution of cases in the case base is critical to the performance of a Case Based Reasoning system. The case author is given little support in the positioning of new cases during the development stage of a case base. In this paper we argue that classification boundaries represent important regions of the problem space. They are used to identify locations where new cases should be acquired. We introduce two complexityguided algorithms which use a local complexity measure and boundary identification techniques to actively discover cases close to boundaries. The ability of these algorithms to discover new cases that significantly improve the accuracy of case bases is demonstrated on five public domain classification datasets.
Discovering Large Empty Maximal HyperRectangle in MultiDimensional Space
, 1997
"... Given a collection of points in a multidimensional space, we consider the problem of finding the set of all possible Maximal HyperRectangle (MHR), defined to be hyperrectangles that are empty and have at least a point bounding each of its surfaces. It is easy to see that there are enormous nu ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Given a collection of points in a multidimensional space, we consider the problem of finding the set of all possible Maximal HyperRectangle (MHR), defined to be hyperrectangles that are empty and have at least a point bounding each of its surfaces. It is easy to see that there are enormous number of such MHRs in a given instance, and most of the time, applications require only to find the "largest" MHR or"sufficiently large" MHRs. Our proposed algorithm solved all the above problems by setting a criterion to measure sufficiently large MHRs so that only those large MHRs will be reported. The algorithm runs much faster when the criterion set is "reasonably tight" as pruning is done naturally in the algorithm. 1 Introduction Problems occurring in multidimensional space have not been studied as extensively as those for one or twodimensional (1D and 2D) counterparts. One such example is the problem of finding the largest empty rectangle in a 2D space containing points o...
Minimum Variance Associations— Discovering Relationships in Numerical Data
 In Proc. of the PacificAsia Conference on Knowledge Discovery and Data Mining
, 2008
"... Abstract. The paper presents minimum variance patterns: a new class of itemsets and rules for numerical data, which capture arbitrary continuous relationships between numerical attributes without the need for discretization. The approach is based on finding polynomials over sets of attributes whose ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The paper presents minimum variance patterns: a new class of itemsets and rules for numerical data, which capture arbitrary continuous relationships between numerical attributes without the need for discretization. The approach is based on finding polynomials over sets of attributes whose variance, in a given dataset, is close to zero. Sets of attributes for which such functions exist are considered interesting. Further, two types of rules are introduced, which help extract understandable relationships from such itemsets. Efficient algorithms for mining minimum variance patterns are presented and verified experimentally. 1 Introduction and Related Research Mining association patterns has a long tradition in datamining. Most methods, however, are designed for binary or categorical attributes. The usual approach to numerical data is discretization [22]. Discretization however leads to information loss and problems such as rules being split over several intervals. Approaches
An efficient implementation of a joint generation algorithm, experimental and efficient algorithms
 Proceedings of the Third International Workshop, WEA, volume 3059 of Lecture Notes in Computer Science
"... Abstract. Let C be an ndimensional integral box, and π be a monotone property defined over the elements of C. We consider the problems of incrementally generating jointly the families Fπ and Gπ of all minimal subsets satisfying property π and all maximal subsets not satisfying property π, when π ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Let C be an ndimensional integral box, and π be a monotone property defined over the elements of C. We consider the problems of incrementally generating jointly the families Fπ and Gπ of all minimal subsets satisfying property π and all maximal subsets not satisfying property π, when π is given by a polynomialtime satisfiability oracle. Problems of this type arise in many practical applications. It is known that the above joint generation problem can be solved in incremental quasipolynomial time. In this paper, we present an efficient implementation of this procedure. We present experimental results to evaluate our implementation for a number of interesting monotone properties π. 1