Results 1 - 10
of
30
Data Mining: An Overview from Database Perspective
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract
-
Cited by 314 (23 self)
- Add to MetaCart
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
A General Incremental Technique for Maintaining Discovered Association Rules
- In Proceedings of the Fifth International Conference On Database Systems For Advanced Applications
, 1997
"... A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the c ..."
Abstract
-
Cited by 79 (5 self)
- Add to MetaCart
A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the case of insertion. The proposed algorithm FUP2 makes use of the previous mining result to cut down the cost of finding the new rules in an updated database. In the insertion only case, FUP2 is equivalent to FUP. In the deletion only case, FUP2 is a complementary algorithm of FUP which is very efficient when the deleted transactions is a small part of the database, which is the most applicable case. In the general case, FUP2 can efficiently update the discovered rules when new transactions are added to a transaction database, and obsolete transactions are removed from it. The proposed algorithm has been implemented and its performance is studied and compared with the best algorithms for mining...
Discovery of Relational Association Rules
- Relational data mining
, 2000
"... Within KDD, the discovery of frequent patterns has been studied in a variety of settings. In its simplest form, known from association rule mining, the task is to discover all frequent item sets, i.e., all combinations of items that are found in a sufficient number of examples. ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Within KDD, the discovery of frequent patterns has been studied in a variety of settings. In its simplest form, known from association rule mining, the task is to discover all frequent item sets, i.e., all combinations of items that are found in a sufficient number of examples.
Effective Data Mining Using Neural Networks
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Classification is one of the data mining problems receiving great attention recently in the database community. This paper presents an approach to discover symbolic classification rules using neural networks. Neural networks have not been thought suited for data mining because how the classificati ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Classification is one of the data mining problems receiving great attention recently in the database community. This paper presents an approach to discover symbolic classification rules using neural networks. Neural networks have not been thought suited for data mining because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by humans. With the proposed approach, concise symbolic rules with high accuracy can be extracted from a neural network. The network is first trained to achieve the required accuracy rate. Redundant connections of the network are then removed by a network pruning algorithm. The activation values of the hidden units in the network are analyzed, and classification rules are generated using the result of this analysis. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of standard data mining test problems. 1 Introduction One ...
X2R: A Fast Rule Generator
- in Proceedings of IEEE International Conference on Systems, Man and Cybernetics
, 1995
"... Although they can learn from raw data, many concept learning algorithms require that the training data contain only discrete data. However, real world problems contain, more often than not, both numeric and discrete data. So before these algorithms can be applied, data discretization (quantization) ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Although they can learn from raw data, many concept learning algorithms require that the training data contain only discrete data. However, real world problems contain, more often than not, both numeric and discrete data. So before these algorithms can be applied, data discretization (quantization) is needed. This paper introduces X2R, a simple and fast algorithm that can be applied to both numeric and discrete data, and generate rules from datasets like Season-Classification, Golf-Playing that contain continuous and/or discrete data. The empirical results demonstrate that X2R can effectively generate rules from the raw data and perform better than some of its peers in terms of the quality of rules and time complexities. 1 Introduction Concept learning is a task to learn some concepts from raw data. Real world problems normally contain both numeric and discrete data. Many concept learning algorithms can only handle discrete data. Before running these algorithms, discretization is nec...
On Preprocessing Data for Effective Classification
- ACM SIGMOD'96 Workshop on Research Issues on Data Mining and Knowledge Discovery
, 1996
"... Classification is probably the most well studied data mining problem. Most recent work in database community focuses on searching efficient classification algorithms. This paper addresses the importance of preprocessing data before applying classification algorithms. In particular, the technique of ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Classification is probably the most well studied data mining problem. Most recent work in database community focuses on searching efficient classification algorithms. This paper addresses the importance of preprocessing data before applying classification algorithms. In particular, the technique of feature selection that eliminates attributes contributing little to classification is discussed. A feature selection algorithm based on conflict analysis is described, and the related implementation issues are analyzed. Results of experiments conducted to highlight the effectiveness of feature selection and related issues are also presented. 1 Introduction Classification, one of data mining problems [AgIS93], is the process of finding the common properties among different entities and classifying those entities into classes. It has been widely studied by researchers in the AI field [WeKu91]. Recently, researchers in the database community re-examined the problem in the context of large data...
Discovery Of Multiple-Level Rules From Large Databases
, 1996
"... With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many or ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many organizations. Previous studies on data mining have been focused on the discovery of knowledge at a single conceptual level, either at the primitive level or at a rather high conceptual level. However, it is often desirable to discover knowledge at multiple conceptual levels, which will provide a spectrum of understanding, from general to specific, for the underlying data. In this thesis, we first introduce the conceptual hierarchy, a hierarchical organization of the data in the databases. Two algorithms for dynamic adjustment of conceptual hierarchies are developed, as well as another algorithm for automatic generation of conceptual hierarchies for numerical attributes. In addition, a set of ...
Data Mining using Learning Classifier Systems
- IN L. BULL (ED) APPLICATIONS OF LEARNING CLASSIFIER SYSTEMS
, 2004
"... ..."
Efficient evaluation of queries with mining predicates
- In Proc. of the 18th Int’l Conference on Data Engineering (ICDE
, 2002
"... Modern relational database systems are beginning to support ad hoc queries on mining models. In this paper, we explore novel techniques for optimizing queries that apply mining models to relational data. For such queries, we use the internal structure of the mining model to automatically derive trad ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Modern relational database systems are beginning to support ad hoc queries on mining models. In this paper, we explore novel techniques for optimizing queries that apply mining models to relational data. For such queries, we use the internal structure of the mining model to automatically derive traditional database predicates. We present algorithms for deriving such predicates for some popular discrete mining models: decision trees, naive Bayes, and clustering. Our experiments on Microsoft SQL Server 2000 demonstrate that these derived predicates can significantly reduce the cost of evaluating such queries. 1.
Database Clustering And Data Warehousing
- ICS Workshop on Software Engineering and Database Systems
, 1998
"... Due to the complexity of real-world applications, the number of databases and the volume of data have increased tremendously. Discovering qualitative and quantitative patterns from databases in such a distributed information-providing environment has been recognized as a challenging task. In respon ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Due to the complexity of real-world applications, the number of databases and the volume of data have increased tremendously. Discovering qualitative and quantitative patterns from databases in such a distributed information-providing environment has been recognized as a challenging task. In response to such a demand, data mining and data warehousing techniques are emerging to extract the previously unknown and potentially useful knowledge to provide better decision support. This paper presents a mechanism called Markov Model Mediators (MMMs) to facilitate the understanding of the data warehouse schemas/views and the improvement of the query processing performance by analyzing and discovering the summarized knowledge at the database level. Simulation results show that the data mining process leads to a better federation of data warehouses and reduces the cost of query processing. To illustrate these benefits, our approach has been implemented and a simple example and several experime...

