Results 1 - 10
of
24
Data Mining: An Overview from Database Perspective
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract
-
Cited by 314 (23 self)
- Add to MetaCart
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
Database Mining: A Performance Perspective
- IEEE Transactions on Knowledge and Data Engineering
, 1993
"... We present our perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classification, associations, and sequences, and argue that these problems can be unifor ..."
Abstract
-
Cited by 247 (12 self)
- Add to MetaCart
We present our perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classification, associations, and sequences, and argue that these problems can be uniformly viewed as requiring discovery of rules embedded in massive data. We describe a model and some basic operations for the process of rule discovery. We show how the database mining problems we consider map to this model and how they can be solved by using the basic operations we propose. We give an example of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm not only is efficient in discovering classification rules but also has accuracy comparable to ID3, one of the current best classifiers. Index Terms. database mining, knowledge discovery, classification, associations, sequences, decision trees Current address: Computer Science De...
An effective hash-based algorithm for mining association rules
, 1995
"... In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transac ..."
Abstract
-
Cited by 195 (2 self)
- Add to MetaCart
In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm. 1
Description Logics in Data Management
, 1995
"... Description logics and reasoners, which are descendants of the kl-one language, have been studied in depth in Artificial Intelligence. After a brief introduction, we survey in this paper their application to the problems of information management, using the framework of an abstract information serve ..."
Abstract
-
Cited by 174 (12 self)
- Add to MetaCart
Description logics and reasoners, which are descendants of the kl-one language, have been studied in depth in Artificial Intelligence. After a brief introduction, we survey in this paper their application to the problems of information management, using the framework of an abstract information server equipped with several operations -- each involving one or more languages. Specifically, we indicate how one can achieve enhanced access to data and knowledge by using descriptions in languages for schema design and integration, queries, answers, updates, rules, and constraints.
Efficient data mining for path traversal patterns
- IEEE Transactions on Knowledge and Data Engineering
, 1998
"... Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we der ..."
Abstract
-
Cited by 128 (10 self)
- Add to MetaCart
Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, we can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, we derive algorithms to determine the frequent traversal patterns¦i.e., large reference sequences¦from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted. Index Terms—Data mining, traversal patterns, distributed information system, World Wide Web, performance analysis.
Data Mining for Path Traversal Patterns in a Web Environment
, 1996
"... In this paper, we explore a new data mining capability which involves mining path traversal patterns in a distributed information providing environment like world-wide-web. First, we convert the original sequence of log data into a set of maximal forward references and filter out the effect of some ..."
Abstract
-
Cited by 98 (1 self)
- Add to MetaCart
In this paper, we explore a new data mining capability which involves mining path traversal patterns in a distributed information providing environment like world-wide-web. First, we convert the original sequence of log data into a set of maximal forward references and filter out the effect of some backward references which are mainly made for ease of traveling. Second, we derive algorithms to determine the frequent traversal patterns, i.e., large reference sequences, from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences: one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed.
Visualization Techniques for Mining Large Databases: A Comparison
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Visual data mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. In this article, we describe and evaluate a new visualization-based approach to mining large databases. The basic idea of our visual data mining ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
Visual data mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. In this article, we describe and evaluate a new visualization-based approach to mining large databases. The basic idea of our visual data mining techniques is to represent as many data items as possible on the screen at the same time by mapping each data value to a pixel of the screen and arranging the pixels adequately. The major goal of this article is to evaluate our visual data mining techniques and to compare them to other well-known visualization techniques for multidimensional data: the parallel coordinate and stick figure visualization techniques. For the evaluation of visual data mining techniques, in the first place the perception of properties of the data counts, and only in the second place the CPU time and the number of secondary storage accesses are important. In addition to testing the visualization techniques using re...
Loading Data into Description Reasoners
, 1993
"... Knowledge-base management systems (KBMS) based on description logics are being used in a variety of situations where access is needed to large amounts of data stored in existing relational databases. We present the architecture and algorithms of a system that converts most of the inferences made by ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
Knowledge-base management systems (KBMS) based on description logics are being used in a variety of situations where access is needed to large amounts of data stored in existing relational databases. We present the architecture and algorithms of a system that converts most of the inferences made by the KBMS into a collection of SQL queries, thereby relying on the optimization facilities of existing DBMS to gain e#ciency, while maintaining an object-centered view of the world with a substantive semantics and significantly di#erent reasoning facilities than those provided by Relational DBMS and their deductive extensions. We address a number of optimization issues that arise in the translation process due to the fact that SQL queries with di#erent syntax (but identical semantics) are not treated uniformly by current database management systems.
Using a Hash-Based Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules
- IEEE Transactions on Knowledge and Data Engineering
, 1997
"... In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usual...
Explaining Reasoning In Description Logics
, 1996
"... Knowledge-based systems, like other software systems, need to be debugged while being developed. In addition, systems providing "expert advice" need to be able to justify their conclusions. Traditionally, developers have been supported during debugging by tools which offer a trace of the operations ..."
Abstract
-
Cited by 55 (17 self)
- Add to MetaCart
Knowledge-based systems, like other software systems, need to be debugged while being developed. In addition, systems providing "expert advice" need to be able to justify their conclusions. Traditionally, developers have been supported during debugging by tools which offer a trace of the operations performed by the system (e.g., a sequence of rule firings in a rule-based expert system) or, more generally by an explanation facility for the reasoner. Description Logics, formal systems developed to reason with taxonomies or classification hierarchies, form the basis of several recent knowledge-based systems but do not currently offer such facilities. In this thesis, we explore four major issues in explaining the conclusions of procedurally implemented deductive systems, concentrating on a specific solution for a class of description logics. First, we consider ho...

