• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Knowledge discovery and interestingness measures: A survey (1999)

by R Hilderman, H Hamilton
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 25
Next 10 →

Clustering web images using association rules, interestingness measures, and hypergraph partitions

by Hassan H. Malik - In: ICWE ’06: Proceedings of the 6th international conference on Web engineering , 2006
"... This paper presents a new approach to cluster web images. Images are first processed to extract signal features such as color in HSV format and quantized orientation. Web pages referring to these images are processed to extract textual features (keywords) and feature reduction techniques such as ste ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
This paper presents a new approach to cluster web images. Images are first processed to extract signal features such as color in HSV format and quantized orientation. Web pages referring to these images are processed to extract textual features (keywords) and feature reduction techniques such as stemming, stop word elimination, and Zipf’s law are applied. All visual and textual features are used to generate association rules. Hypergraphs are generated from these rules, with features used as vertices and discovered associations as hyperedges. Twenty-two objective “interestingness ” measures are evaluated on their ability to prune non-interesting rules and to assign weights to hyperedges. Then a hypergraph partitioning algorithm is used to generate clusters of features, and a simple scoring function is used to assign images to clusters. A tree-distance-based evaluation measure is used to evaluate the quality of image clustering with respect to manually generated ground truth. Our experiments indicate that combining textual and contentbased features results in better clustering as compared to signalonly or text-only approaches. Online steps are done in real-time, which makes this approach practical for web images. Furthermore, we demonstrate that statistical interestingness measures such as Correlation Coefficient, Laplace, Kappa and J-Measure result in better clustering compared to traditional association rule interestingness measures such as Support and Confidence.

Association Rules and Predictive Models for e-Banking Services

by Vasilis Aggelis, Dimitris Christodoulakis - in Proc. of 1 st Balkan Conf. in Informatics, Tessaloniki , 2003
"... Abstract. The introduction of data mining methods in the banking area although conducted in a slower way than in other fields, mainly due to the nature and sensitivity of bank data, can already be considered of great assistance to banks as to prediction, forecasting and decision making. One particul ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract. The introduction of data mining methods in the banking area although conducted in a slower way than in other fields, mainly due to the nature and sensitivity of bank data, can already be considered of great assistance to banks as to prediction, forecasting and decision making. One particular method is the investigation for association rules between products and services a bank offers. Results are generally impressive since in many cases strong relations are established, which are not easily observed at a first glance. These rules are used as additional tools aiming at the continuous improvement of bank services and products helping also in approaching new customers. In addition, the development and continuous training of prediction models is a very significant task, especially for bank organizations. The establishment of such models with the capacity of accurate prediction of future facts enhances the decision making and the fulfillment of the bank goals, especially in case these models are applied on specific bank units. E-banking can be considered such a unit receiving influence from a number of different sides. Scope of this paper is the demonstration of the application of data mining methods to e-banking. In other words association rules concerning e-banking are discovered using different techniques and a prediction model is established, depending on e-banking parameters like the transactions volume conducted through this alternative channel in relation with other crucial parameters like the number of active users. 1

Discovering interesting association rules by clustering

by Yanchang Zhao, Chengqi Zhang, Shichao Zhang - In Proceedings of the 17th Australian Joint Conference on Artificial Intelligence , 2004
"... Abstract. There are a great many metrics available for measuring the interestingness of rules. In this paper, we design a distinct approach for identifying association rules that maximizes the interestingness in an applied context. More specifically, the interestingness of association rules is defin ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract. There are a great many metrics available for measuring the interestingness of rules. In this paper, we design a distinct approach for identifying association rules that maximizes the interestingness in an applied context. More specifically, the interestingness of association rules is defined as the dissimilarity between corresponding clusters. In addition, the interestingness assists in filtering out those rules that may be uninteresting in applications. Experiments show the effectiveness of our algorithm.

Mining Frequent Arrangements of Temporal Intervals

by Panagiotis Papapetrou, George Kollios, Stan Sclaroff, Dimitrios Gunopulos - UNDER CONSIDERATION FOR PUBLICATION IN KNOWLEDGE AND INFORMATION SYSTEMS , 2008
"... The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The m ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Three efficient methods to find frequent arrangements of temporal intervals are described; the first two are tree-based and use breadth and depth first search to mine the set of frequent arrangements, whereas the third one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered

Learning Interestingness of Streaming Classification Rules

by Tolga Aydın, Halil Altay Güvenir - Proceedings of 19th International Symposium on Computer and Information Sciences (ISCIS 2004
"... Abstract. Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathe ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract. Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as “interesting ” or “uninteresting ” with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules. 1

An Automated Report Generation Tool for The Data Understanding

by Juha Vesanto, Jaakko Hollmen - Proceedings of the First International Workshop on Hybrid Intelligent Systems (HIS’01 , 2001
"... To prepare and model data successfully, the data miner needs to be aware of the properties of the data manifold. In this paper, the outline of a tool for automatically generating data survey reports for this purpose is described. The report combines linguistic descriptions (rules) and statistical me ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
To prepare and model data successfully, the data miner needs to be aware of the properties of the data manifold. In this paper, the outline of a tool for automatically generating data survey reports for this purpose is described. The report combines linguistic descriptions (rules) and statistical measures with visualizations. Together these provide both quantitative and qualitative information and help the user to form a mental model of the data. The main focus is on describing the cluster structure and the contents of the clusters. The data is clustered using a novel algorithm based on the Self-Organizing Map. The rules describing the clusters are selected using a signi cance measure based on the con dence on their characterizing and discriminating properties.

Mining Patterns Using Relaxations of User Defined Constraints

by Cláudia Antunes, Arlindo L. Oliveira, Técnico Inesc-id - In Proc. of the Workshop on Knowledge Discovery in Inductive Databases , 2004
"... Abstract. The main drawbacks of sequential pattern mining have been its lack of focus on user expectations and the high number of discovered patterns. However, the solution commonly accepted – the use of constraints – approximates the mining process to a hypothesis-testing task. In this paper, we pr ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. The main drawbacks of sequential pattern mining have been its lack of focus on user expectations and the high number of discovered patterns. However, the solution commonly accepted – the use of constraints – approximates the mining process to a hypothesis-testing task. In this paper, we propose a new methodology to mine sequential patterns, keeping the focus on user expectations, without compromising the discovery of unknown patterns. Our methodology is based on the use of constraint relaxations, and it consists on using them to filter accepted patterns during the mining process. We propose a hierarchy of relaxations, applied to constraints expressed as context-free languages. 1

Introducing A Rule Importance Measure

by Jiye Li, Nick Cercone
"... Abstract. Association rule algorithms often generate an excessive number of rules, many of which are not significant. It is difficult to determine which rules are more useful, interesting and important. We introduce a rough set based Rule Importance Measure to select the most important rules. We use ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. Association rule algorithms often generate an excessive number of rules, many of which are not significant. It is difficult to determine which rules are more useful, interesting and important. We introduce a rough set based Rule Importance Measure to select the most important rules. We use ROSETTA software to generate multiple reducts. Apriori association rule algorithm is then applied to generate rule sets for each data set based on each reduct. Some rules are generated more frequently than the others among the total rule sets. We consider such rules as more important. We define rule importance as the frequency of an association rule generated across all the rule sets. Rule importance is different from either rule interestingness measures or rule quality measures because of their application tasks, the processes where the measures are applied and the contents they measure. The experimental results from an artificial data set, UCI machine learning datasets and an actual geriatric care medical data set show that our method reduces the computational cost for rule generation and provides an effective measure of how important is a rule.

Information-Theoretical and Combinatorical Methods in Data-Mining

by Szymon Jaroszewicz, Szymon Jaroszewicz , 2003
"... INFORMATION-THEORETICAL AND COMBINATORIAL METHODS IN DATA-MINING December 2003 Szymon Jaroszewicz, M.Sc., Technical University of Szczecin Ph.D., University of Massachusetts Boston Directed by Professor Dan A. Simovici Various applications of information theoretical and combinatorial methods ..."
Abstract - Add to MetaCart
INFORMATION-THEORETICAL AND COMBINATORIAL METHODS IN DATA-MINING December 2003 Szymon Jaroszewicz, M.Sc., Technical University of Szczecin Ph.D., University of Massachusetts Boston Directed by Professor Dan A. Simovici Various applications of information theoretical and combinatorial methods in data mining are presented.

A Comparison of Hardware and Software in Sequence Rule Evolution

by Magnus Hetland, Pal Saetrom , 2003
"... Sequence rule mining is an important problem in the field of data mining. ..."
Abstract - Add to MetaCart
Sequence rule mining is an important problem in the field of data mining.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University