Results 1 
4 of
4
Evaluation of Interestingness Measures for Ranking Discovered Knowledge
 Lecture Notes in Computer Science
, 2001
"... When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this pa ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice. 1
Measuring the interestingness of discovered knowledge: A principled approach
 Intell. Data Anal
"... ..."
The Lorenz Dominance Order As a Measure Of . . .
 PROCEEDINGS OF THE 6TH PACIFICASIA CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (PAKDD’02
, 2002
"... Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attrib ..."
Abstract
 Add to MetaCart
Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attribute, called a domain generalization graph (DGG). A DGG for an attribute is a directed graph where each node represents a domain of values created by partitioning the original domain for the attribute, and each edge represents a generalization relation between these domains. Given a set of DGGs associated with a set of attributes, a generalization space can be defined as all possible combinations of domains, where one domain is selected from each DGG for each combination. This generalization space describes, then, all possible summaries consistent with the DGGs that can be generated from the selected attributes. When the number of attributes to be generalized is large or the DGGs associated with the attributes are complex, the generalization space can be very large, resulting in the generation of many summaries. The number of summaries can easily exceed the capabilities of a domain expert to identify interesting results. In this paper, we show that the Lorenz dominance order can be used to rank the summaries prior to presentation to the domain expert. The Lorenz dominance order defines a partial order on the summaries, in most cases, and in some cases, defines a total order. The rank order of the summaries represents an objective evaluation of their relative interestingness and provides the domain expert with a starting point for further subjective evaluation of the summaries.
User Focused Database Summarization Approach
"... Mining information from very large databases poses numerous challenges. In fact, systems that can mine such voluminous databases are increasingly desirable. In this context, we propose a generic approach of database summarization that takes into account the user’s interest topic. Innovation in our w ..."
Abstract
 Add to MetaCart
Mining information from very large databases poses numerous challenges. In fact, systems that can mine such voluminous databases are increasingly desirable. In this context, we propose a generic approach of database summarization that takes into account the user’s interest topic. Innovation in our work consists in the generation of a set of database summaries having different levels of granularity in order to satisfy the user’s expectations. 1.