Results 1 
6 of
6
Evaluation of Interestingness Measures for Ranking Discovered Knowledge
 Lecture Notes in Computer Science
, 2001
"... When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this pa ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
(Show Context)
When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice. 1
Measuring the interestingness of discovered knowledge: A principled approach
 Intell. Data Anal
"... ..."
(Show Context)
Data, Information and Knowledge for Medical Scenario Construction
 In proceedings of the Intelligent Data Analysis In Medicine and Pharmacology Workshop
, 2003
"... The automatic recognition of typical pattern sequences (scenarios), as they are developing, is of crucial importance for computeraided patient supervision. However, the construction of such scenarios directly from medical expertise is unrealistic in practice. Starting from the monitored data and ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
The automatic recognition of typical pattern sequences (scenarios), as they are developing, is of crucial importance for computeraided patient supervision. However, the construction of such scenarios directly from medical expertise is unrealistic in practice. Starting from the monitored data and clinical information available, our objective is to extract typical abstracted pattern sequences and then construct scenarios validated by clinical experts as representative of a class of situations to recognize. In this paper, we present a methodology for data abstraction, based on the management of data, information and knowledge, for the extraction of specific events and eventually the construction of such scenarios. 1
Effective Algorithm in Mining Interestingness Clustering
"... Abstract: It presented an association rule clustering general framework exploring interestingness of rules. It investigate a general framework for clustering unconstrained association rules over unconstrained domains to enable automating much of the laborious manual effort normally involved in the e ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: It presented an association rule clustering general framework exploring interestingness of rules. It investigate a general framework for clustering unconstrained association rules over unconstrained domains to enable automating much of the laborious manual effort normally involved in the exploration and understanding of interestingness. We also introduce an algorithm and investigate how this approach can be incorporated into the mining process. The output of the new mining process is significantly reduced, almost by half, making postprocessing easier, plus postprocessing can often achieve similar results with shorter runtime.
The Lorenz Dominance Order As a Measure Of . . .
 PROCEEDINGS OF THE 6TH PACIFICASIA CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (PAKDD’02
, 2002
"... Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attrib ..."
Abstract
 Add to MetaCart
Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attribute, called a domain generalization graph (DGG). A DGG for an attribute is a directed graph where each node represents a domain of values created by partitioning the original domain for the attribute, and each edge represents a generalization relation between these domains. Given a set of DGGs associated with a set of attributes, a generalization space can be defined as all possible combinations of domains, where one domain is selected from each DGG for each combination. This generalization space describes, then, all possible summaries consistent with the DGGs that can be generated from the selected attributes. When the number of attributes to be generalized is large or the DGGs associated with the attributes are complex, the generalization space can be very large, resulting in the generation of many summaries. The number of summaries can easily exceed the capabilities of a domain expert to identify interesting results. In this paper, we show that the Lorenz dominance order can be used to rank the summaries prior to presentation to the domain expert. The Lorenz dominance order defines a partial order on the summaries, in most cases, and in some cases, defines a total order. The rank order of the summaries represents an objective evaluation of their relative interestingness and provides the domain expert with a starting point for further subjective evaluation of the summaries.
User Focused Database Summarization Approach
"... Mining information from very large databases poses numerous challenges. In fact, systems that can mine such voluminous databases are increasingly desirable. In this context, we propose a generic approach of database summarization that takes into account the user’s interest topic. Innovation in our w ..."
Abstract
 Add to MetaCart
(Show Context)
Mining information from very large databases poses numerous challenges. In fact, systems that can mine such voluminous databases are increasingly desirable. In this context, we propose a generic approach of database summarization that takes into account the user’s interest topic. Innovation in our work consists in the generation of a set of database summaries having different levels of granularity in order to satisfy the user’s expectations. 1.