Results 1  10
of
508
Discovering Frequent Closed Itemsets for Association Rules
, 1999
"... In this paper, we address the problem of finding frequent itemsets in a database. Using the closed itemset lattice framework, we show that this problem can be reduced to the problem of finding frequent closed itemsets. Based on this statement, we can construct efficient data mining algorithms by lim ..."
Abstract

Cited by 417 (13 self)
 Add to MetaCart
In this paper, we address the problem of finding frequent itemsets in a database. Using the closed itemset lattice framework, we show that this problem can be reduced to the problem of finding frequent closed itemsets. Based on this statement, we can construct efficient data mining algorithms by limiting the search space to the closed itemset lattice rather than the subset lattice. Moreover, we show that the set of all frequent closed itemsets suffices to determine a reduced set of association rules, thus addressing another important data mining problem: limiting the number of rules produced without information loss. We propose a new algorithm, called AClose, using a closure mechanism to find frequent closed itemsets. We realized experiments to compare our approach to the commonly used frequent itemset search approach. Those experiments showed that our approach is very valuable for dense and/or correlated data that represent an important part of existing databases.
STING: A statistical information grid approach to spatial data mining
, 1997
"... Spatial data mining, i.e., discovery of interesting characteristics and patterns that may implicitly exist in spatial databases, is a challenging task due to the huge amounts of spatial data and to the new conceptual nature of the problems which must account for spatial distance. Clustering and regi ..."
Abstract

Cited by 284 (10 self)
 Add to MetaCart
Spatial data mining, i.e., discovery of interesting characteristics and patterns that may implicitly exist in spatial databases, is a challenging task due to the huge amounts of spatial data and to the new conceptual nature of the problems which must account for spatial distance. Clustering and region oriented queries are common problems in this domain. Several approaches have been presented in recent years, all of which require at least one scan of all individual objects (points). Consequently, the computational complexity is at least linearly proportional to the number of objects to answer each query. In this paper, we propose a hierarchical statistical information grid based approach for spatial data mining to reduce the cost further. The idea is to capture statistical information associated with spatial cells in such a manner that whole classes of queries and clustering problems can be answered without recourse to the individual objects. In theory, and confirmed by empirical studies, this approach outperforms the best previous method by at least an order of magnitude, especially when the data set is very large.
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling
, 1999
"... Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms, such as Kmeans, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit s ..."
Abstract

Cited by 272 (23 self)
 Add to MetaCart
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms, such as Kmeans, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit some static models. These algorithms can breakdown if the choice of parameters in the static model is incorrect with respect to the data set being clustered, or if the model is not adequate to capture the characteristics of clusters. Furthermore, most of these algorithms breakdown when the data consists of clusters that are of diverse shapes, densities, and sizes. In this paper, we present a novel hierarchical clustering algorithm called CHAMELEON that measures the similarity of two clusters based on a dynamic model. In the clustering process, two clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high relative to the internal intercon...
3D shape histograms for similarity search and classification in spatial databases
 SSD'99
, 1999
"... Classification is one of the basic tasks of data mining in modern database applications including molecular biology, astronomy, mechanical engineering, medical imaging or meteorology. The underlying models have to consider spatial properties such as shape or extension as well as thematic attributes ..."
Abstract

Cited by 176 (11 self)
 Add to MetaCart
Classification is one of the basic tasks of data mining in modern database applications including molecular biology, astronomy, mechanical engineering, medical imaging or meteorology. The underlying models have to consider spatial properties such as shape or extension as well as thematic attributes. We introduce 3D shape histograms as an intuitive and powerful similarity model for 3D objects. Particular flexibility is provided by using quadratic form distance functions in order to account for errors of measurement, sampling, and numerical rounding that all may result in small displacements and rotations of shapes. For query processing, a general filterrefinement architecture is employed that efficiently supports similarity search based on quadratic forms. An experimental evaluation in the context of molecular biology demonstrates both, the high classification accuracy of more than 90 % and the good performance of the approach.
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs
 In Proceedings on Advances in Digital Libraries Conference (ADL'98
, 1998
"... As a con#uence of data mining and WWW technologies, it is now possible to perform data mining on web logrecords collectedfrom the Internet web page access history. The behaviour of the web page readers is imprinted in the web server log #les. Analyzing and exploring regularities in this behaviour ca ..."
Abstract

Cited by 175 (8 self)
 Add to MetaCart
(Show Context)
As a con#uence of data mining and WWW technologies, it is now possible to perform data mining on web logrecords collectedfrom the Internet web page access history. The behaviour of the web page readers is imprinted in the web server log #les. Analyzing and exploring regularities in this behaviour can improve system performance, enhance the quality and delivery of Internet information services to the end user, and identify population of potential customers for electronic commerce. Thus, by observing people using collections of data, data mining can bring considerable contribution to digital library designers.
Efficient Mining Of Association Rules Using Closed Itemset Lattices
 INFORMATION SYSTEMS
, 1999
"... Discovering association rules is one of the most important task in data mining. Many efficient algorithms have been proposed in the literature. The most noticeable are Apriori, Mannila's algorithm, Partition, Sampling and DIC, that are all based on the Apriori mining method: pruning the subset ..."
Abstract

Cited by 158 (11 self)
 Add to MetaCart
Discovering association rules is one of the most important task in data mining. Many efficient algorithms have been proposed in the literature. The most noticeable are Apriori, Mannila's algorithm, Partition, Sampling and DIC, that are all based on the Apriori mining method: pruning the subset lattice (itemset lattice). In this paper we propose an efficient algorithm, called Close, based on a new mining method: pruning the closed set lattice (closed itemset lattice). This lattice, which is a suborder of the subset lattice, is closely related to Wille's concept lattice in formal concept analysis. Experiments comparing Close to an optimized version of Apriori showed that Close is very efficient for mining dense and/or correlated data such as census style data, and performs reasonably well for market basket style data.
A Survey of Methods for Scaling Up Inductive Algorithms
 Data Mining and Knowledge Discovery
, 1999
"... . One of the defining challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, and compares existing work on scaling up inductive algorithms. We concentrate on algorithms that build decision trees and rule ..."
Abstract

Cited by 107 (11 self)
 Add to MetaCart
. One of the defining challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, and compares existing work on scaling up inductive algorithms. We concentrate on algorithms that build decision trees and rule sets, in order to provide focus and specific details; the issues and techniques generalize to other types of data mining. We begin with a discussion of important issues related to scaling up. We highlight similarities among scaling techniques by categorizing them into three main approaches. For each approach, we then describe, compare, and contrast the different constituent techniques, drawing on specific examples from published papers. Finally, we use the preceding analysis to suggest how to proceed when dealing with a large problem, and where to focus future research. Keywords: scaling up, inductive learning, decision trees, rule learning 1. Introduction The knowledge discovery and data...
Clustering Based On Association Rule Hypergraphs
"... Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These discovered clusters are used to explain the characteristics of the data distribution. In this paper we propose a new metho ..."
Abstract

Cited by 99 (16 self)
 Add to MetaCart
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These discovered clusters are used to explain the characteristics of the data distribution. In this paper we propose a new methodology for clustering related items using association rules, and clustering related transactions using clusters of items. Our approach is linearly scalable with respect to the number of transactions. The frequent itemsets used to derive association rules are also used to group items into a hypergraph edge, and a hypergraph partitioning algorithm is used to find the clusters. Our experiments indicate that clustering using association rule hypergraphs holds great promise in several application domains. Our experiments with stockmarket data and congressional voting data show that this clustering scheme is able to successfully group items that belong to the same group. Clustering of items can ...
Segmentation problems
, 2004
"... We study a novel genre of optimization problems, which we call segmentation problems, motivated in part by certain aspects of clustering and data mining. For any classical optimization problem, the corresponding segmentation problem seeks to partition a set of cost vectors into several segments, s ..."
Abstract

Cited by 86 (5 self)
 Add to MetaCart
We study a novel genre of optimization problems, which we call segmentation problems, motivated in part by certain aspects of clustering and data mining. For any classical optimization problem, the corresponding segmentation problem seeks to partition a set of cost vectors into several segments, so that the overall cost is optimized. We focus on two natural and interesting (but MAXSNPcomplete) problems in this class, the HYPERCUBE SEGMENTATION PROBLEM and the CATALOG SEGMENTATION PROBLEM, and present approximation algorithms for them. We also present a general greedy scheme, which can be specialized to approximate any segmentation problem.
Constructing Knowledge From Multivariate Spatiotemporal Data: Integrating Geographic Visualization (GVis) with Knowledge Discovery in Database (KDD) Methods
 International Journal of Geographical Information Science
, 1999
"... In this paper, we develop an approach to the process of constructing knowledge through structured exploration of large spatiotemporal data sets. We begin by introducing our problem context and defining both Geographic Visualization (GVis) and Knowledge Discovery in Databases (KDD), the source domain ..."
Abstract

Cited by 85 (20 self)
 Add to MetaCart
(Show Context)
In this paper, we develop an approach to the process of constructing knowledge through structured exploration of large spatiotemporal data sets. We begin by introducing our problem context and defining both Geographic Visualization (GVis) and Knowledge Discovery in Databases (KDD), the source domains for methods being integrated. Next, we review and compare recent GVis and KDD developments and consider the potential for their integration, emphasizing that an iterative process with user interaction is a central focus for uncovering interesting and meaningful patterns through each. We then introduce an approach to design of an integrated GVisKDD environment directed to exploration and discovery in the context of spatiotemporal environmental data. The approach emphasizes a matching of GVis and KDD metaoperations. Following description of the GVis and KDD methods that are linked in our prototype system, we present a demonstration of the prototype applied to a typical spatiotemporal datas...