Results 1 - 10
of
444
Automatic Subspace Clustering of High Dimensional Data
- Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the or ..."
Abstract
-
Cited by 724 (12 self)
- Add to MetaCart
(Show Context)
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Integrating classification and association rule mining
- In Proc of KDD
, 1998
"... Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of di ..."
Abstract
-
Cited by 578 (21 self)
- Add to MetaCart
(Show Context)
Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. In this paper, we propose to integrate these two mining techniques. The integration is done by focusing on mining a special subset of association rules, called class association rules (CARs). An efficient algorithm is also given for building a classifier based on the set of discovered CARs. Experimental results show that the classifier built this way is, in general, more accurate than that produced by the state-of-the-art classification system C4.5. In addition, this integration helps to solve a number of problems that exist in the current classification systems.
Data Mining: An Overview from Database Perspective
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract
-
Cited by 532 (26 self)
- Add to MetaCart
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
New Algorithms for Fast Discovery of Association Rules
- In 3rd Intl. Conf. on Knowledge Discovery and Data Mining
, 1997
"... Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery ..."
Abstract
-
Cited by 397 (26 self)
- Add to MetaCart
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery of frequent itemsets, which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The related database items are grouped together into clusters representing the potential maximal frequent itemsets in the database. Each cluster induces a sub-lattice of the itemset lattice. Efficient lattice traversal techniques are presented, which quickly identify all the true maximal frequent itemsets, and all their subsets if desired. We also present the effect of using different database layout schemes combined with the proposed clustering and traversal techniques. The proposed algorithms scan a (pre-processed) d...
Exploratory Mining and Pruning Optimizations of Constrained Associations Rules
, 1998
"... From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcom- ings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model f ..."
Abstract
-
Cited by 313 (44 self)
- Add to MetaCart
(Show Context)
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcom- ings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model functions as a black-box, admitting little user interaction in between. We propose, in this paper, an architecture that opens up the black-box, and supports constraintbased, human-centered exploratory mining of associations. The foundation of this architecture is a rich set of con- straint constructs, including domain, class, and $QL-style aggregate constraints, which enable users to clearly specify what associations are to be mined. We propose constrained association queries as a means of specifying the constraints to be satisfied by the antecedent and consequent of a mined association.
Levelwise Search and Borders of Theories in Knowledge Discovery
, 1997
"... One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm fo ..."
Abstract
-
Cited by 263 (15 self)
- Add to MetaCart
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ` L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
A tree projection algorithm for generation of frequent itemsets
- Journal of Parallel and Distributed Computing
, 2000
"... In this paper we propose algorithms for generation of frequent itemsets by successive construction of the nodes of a lexicographic tree of itemsets. We discuss di erent strategies in generation and traversal of the lexicographic tree such as breadth- rst search, depth- rst search or a combination of ..."
Abstract
-
Cited by 192 (2 self)
- Add to MetaCart
(Show Context)
In this paper we propose algorithms for generation of frequent itemsets by successive construction of the nodes of a lexicographic tree of itemsets. We discuss di erent strategies in generation and traversal of the lexicographic tree such as breadth- rst search, depth- rst search or a combination of the two. These techniques provide di erent trade-o s in terms of the I/O, memory and computational time requirements. We use the hierarchical structure of the lexicographic tree to successively project transactions at each node of the lexicographic tree, and use matrix counting on this reduced set of transactions for nding frequent itemsets. We tested our algorithm on both real and synthetic data. We provide an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature. The algorithm has a well structured data access pattern which provides data locality and reuse of data for multiple levels of the cache. We also discuss methods for parallelization of the
Maintaining Data Privacy in Association Rule Mining
- In Proceedings of the 28th VLDB Conference, Hong Kong
, 2002
"... Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. We investigate here, with respect to mining association rules, whether users can be encouraged to provide correct information by ensuri ..."
Abstract
-
Cited by 171 (5 self)
- Add to MetaCart
(Show Context)
Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. We investigate here, with respect to mining association rules, whether users can be encouraged to provide correct information by ensuring that the mining process cannot, with any reasonable degree of certainty, violate their privacy. We present a scheme, based on probabilistic distortion of user data, that can simultaneously provide a high degree of privacy to the user and retain a high level of accuracy in the mining results. The performance of the scheme is validated against representative real and syn- thetic datasets.
Mining Access Patterns Efficiently from Web Logs
- Proc. 2000 Paci c-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'00
, 2000
"... With the explosive growth of data available on the World Wide Web, discovery and analysis of useful information from the World Wide Web becomes a practical necessity.Web access pattern, which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in pr ..."
Abstract
-
Cited by 155 (3 self)
- Add to MetaCart
With the explosive growth of data available on the World Wide Web, discovery and analysis of useful information from the World Wide Web becomes a practical necessity.Web access pattern, which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. In this paper, we study the problem of mining access patterns from Web logs efficiently. A novel data structure, called Web access pattern tree, or WAP-tree in short, is developed for efficient mining of access patterns from pieces of logs. The Web access pattern tree stores highly compressed, critical information for access pattern mining and facilitates the developmentofnovel algorithms for mining access patterns in large set of log pieces. Our algorithm can find access patterns from Web logs quite efficiently. The experimental and performance studies show that our method is in general an order of magnitude faster than conventional methods.
Algorithms for Association Rule Mining -- A General Survey and Comparison
, 2000
"... Today there are several efficient algorithms that cope with the popular and computationally expensive task of association rule mining. Actually, these algorithms are more or less described on their own. In this paper we explain the fundamentals of association rule mining and moreover derive a genera ..."
Abstract
-
Cited by 155 (5 self)
- Add to MetaCart
Today there are several efficient algorithms that cope with the popular and computationally expensive task of association rule mining. Actually, these algorithms are more or less described on their own. In this paper we explain the fundamentals of association rule mining and moreover derive a general framework. Based on this we describe today's approaches in context by pointing out common aspects and differences. After that we thoroughly investigate their strengths and weaknesses and carry out several runtime experiments. It turns out that the runtime behavior of the algorithms is much more similar as to be expected.