Results 1 - 10
of
19
Finding Interesting Rules from Large Sets of Discovered Association Rules
, 1994
"... Association rules, introduced by Agrawal, Imielinski, and Swami, are rules of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Efficient methods exist for discovering association rules from large collections of data. Th ..."
Abstract
-
Cited by 185 (9 self)
- Add to MetaCart
Association rules, introduced by Agrawal, Imielinski, and Swami, are rules of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Efficient methods exist for discovering association rules from large collections of data. The number of discovered rules can, however, be so large that browsing the rule set and finding interesting rules from it can be quite difficult for the user. We show how a simple formalism of rule templates makes it possible to easily describe the structure of interesting rules. We also give examples of visualization of rules, and show how a visualization tool interfaces with rule templates. 1 Introduction Data mining (knowledge discovery in databases) is a field of increasing interest combining databases, artificial intelligence, and machine learning. The purpose of data mining is to facilitate understanding large amounts of data by discovering interesting regularities or exceptions (see e...
Systems for Knowledge Discovery in Databases
- IEEE Transactions On Knowledge And Data Engineering
, 1993
"... The automated discovery of knowledge in databases is becoming increasingly important as the world's wealth of data continues to grow exponentially. Knowledge-discovery systems face challenging problems from real-world databases which tend to be dynamic, incomplete, redundant, noisy, sparse, and very ..."
Abstract
-
Cited by 88 (8 self)
- Add to MetaCart
The automated discovery of knowledge in databases is becoming increasingly important as the world's wealth of data continues to grow exponentially. Knowledge-discovery systems face challenging problems from real-world databases which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. This paper addresses these problems and describes some techniques for handling them. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research.
Detecting group differences: Mining contrast sets
- Data Mining and Knowledge Discovery
, 2001
"... A fundamental task in data analysis is understanding the differences between several con-trasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mini ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
A fundamental task in data analysis is understanding the differences between several con-trasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mining contrast sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the computational complexity. Once the contrast sets are found, we post-process the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.
A Survey of Temporal Knowledge Discovery Paradigms and Methods
- IEEE Transactions on Knowledge and Data Engineering
, 2002
"... AbstractÐWith the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and impl ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
AbstractÐWith the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining. Index TermsÐTemporal data mining, time sequence mining, trend analysis, temporal rules, semantics of mined rules. 1
Detecting Change in Categorical Data: Mining Contrast Sets
- In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining
, 1999
"... A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 versus 1998. We present the problem of mining ..."
Abstract
-
Cited by 45 (5 self)
- Add to MetaCart
A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 versus 1998. We present the problem of mining contrast-sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide an algorithm for mining contrast-sets as well as several pruning rules to reduce the computational complexity. Once the deviations are found, we post-process the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections. 1 Introduction A common question in exploratory research is: "How do several contrasting groups differ?" Learning about group differences is a central ...
Data Mining: Statistics and More?
, 1998
"... this article we examine some of the major di#erences in emphasis between statistics and data mining. In Section 3 we look at some of the major tools, and Section 4 concludes. ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
this article we examine some of the major di#erences in emphasis between statistics and data mining. In Section 3 we look at some of the major tools, and Section 4 concludes.
GA-MINER: Parallel Data Mining with Hierarchical Genetic Algorithms - Final Report
, 1995
"... Many organisations now routinely gather vast and ever-increasing amounts of data in the ordinary course of their business. While much of this information is collected for day-to-day operational reasons, many businesses are now realising that this data has much additional value for improving operatio ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Many organisations now routinely gather vast and ever-increasing amounts of data in the ordinary course of their business. While much of this information is collected for day-to-day operational reasons, many businesses are now realising that this data has much additional value for improving operational processes. Large databases can form the basis of decision support systems, often based around a data warehouse. Such systems may then be used for a variety of applications such as trend spotting, pattern recognition, behavioral modeling and customer worth assessment. Against this backdrop, the term data mining is used to refer to the process of searching through a large volume of data to discover interesting and useful information. The authors have traditionally sought to divide data mining into three types or levels---undirected or pure data mining, where the system is left almost entirely unconstrained to discover patterns in the data free of prejudices from the user; directed data mi...
Data Mining: Research Trends, Challenges, and Applications
- in Roughs Sets and Data Mining: Analysis of Imprecise Data
, 1997
"... Data mining is an interdisciplinary research area spanning severals disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and pra ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Data mining is an interdisciplinary research area spanning severals disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Many aspects of data mining have been investigated in several related fields. A unique but important aspect of the problem lies in the significance of needs to extend these studies to include the nature of the contents of the real-world databases. In this chapter, we discuss the theory and foundational issues in data mining, describe data mining methods and algorithms, and review data mining applications. Since a major focus of this book is on rough sets and its applications to database mining, one full section is devoted to summari...
Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques
- In Intelligent Data Analysis Journal
, 2000
"... The discovery of characteristic rules is a well-known data mining technique and has lead to several successful applications. Unfortunately, typically a (very) large number of rules is discovered during the mining stage. This makes monitoring and control of these rules extremely costly and difficult. ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
The discovery of characteristic rules is a well-known data mining technique and has lead to several successful applications. Unfortunately, typically a (very) large number of rules is discovered during the mining stage. This makes monitoring and control of these rules extremely costly and difficult. Therefore, a selection of the most promising rules is desirable. In this paper, we propose an integer programming model to solve the problem of selecting the most promising subset of characteristic rules. The proposed technique allows to control a user-defined level of overall quality of the model in combination with a maximum reduction of the redundancy extant in the original ruleset. We use real-world data to evaluate the performance of the proposed technique against the wellknown RuleCover heuristic. 1 Introduction Data mining is the automated search for hidden, previously unknown and potentially useful information from large databases. Moreover, data mining is a crucial pha...
Rule Discovery in Telecommunication Alarm Data
- Journal of Network and Systems Management
, 1999
"... Fault management is an important but difficult area of telecommunication... ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Fault management is an important but difficult area of telecommunication...

