Results 1  10
of
20
PrivacyPreserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Abstract

Cited by 608 (3 self)
 Add to MetaCart
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decisiontree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose anovel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 561 (12 self)
 Add to MetaCart
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
SPRINT: A scalable parallel classifier for data mining
, 1996
"... Classification is an important data mining problem. Although classification is a wellstudied problem, most of the current classification algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. ..."
Abstract

Cited by 250 (7 self)
 Add to MetaCart
Classification is an important data mining problem. Although classification is a wellstudied problem, most of the current classification algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decisiontreebased classification algorithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model. This parallelization, also presented here, exhibits excellent scalability as well. The combination of these characteristics makes the proposed algorithm an ideal tool for data mining. 1
DMQL: A Data Mining Query Language for Relational Databases
, 1996
"... The emerging data mining tools and systems lead naturally to the demand of a powerful data mining query language, on top of which manyinteractive and #exible graphical user interfaces can be developed. This motivates us to design a data mining query language, DMQL, for mining di#erent kinds of knowl ..."
Abstract

Cited by 126 (6 self)
 Add to MetaCart
The emerging data mining tools and systems lead naturally to the demand of a powerful data mining query language, on top of which manyinteractive and #exible graphical user interfaces can be developed. This motivates us to design a data mining query language, DMQL, for mining di#erent kinds of knowledge in relational databases. Portions of the proposed DMQL language have been implemented in our DBMiner system for interactive mining of multiplelevel knowledge in relational databases. 1 Introduction Data mining is a promising #eld with #ourishing R
MachineLearning Research  Four Current Directions
"... Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up super ..."
Abstract

Cited by 114 (1 self)
 Add to MetaCart
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models.
The Quest Data Mining System
 In Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining
, 1996
"... This paper is a capsule summary of the current functionality and architecture of the Quest data mining System. Our overall approach has been to identify basic data mining operations that cut across applications and develop fast, scalable algorithms for their execution (Agrawal, Imielinski, & Swami 1 ..."
Abstract

Cited by 78 (2 self)
 Add to MetaCart
This paper is a capsule summary of the current functionality and architecture of the Quest data mining System. Our overall approach has been to identify basic data mining operations that cut across applications and develop fast, scalable algorithms for their execution (Agrawal, Imielinski, & Swami 1993a). We wanted our algorithms to:
Partial classification using association rules
 Proc. 3th Int. Conf. on KDD
, 1997
"... Many reallife problems require a partial classification of the data. We use the term "partial classification" to describe the discovery of models that show characteristics of the data classes, but may not cover all classes and all examples of any given class. Complete classification may be infeasib ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
Many reallife problems require a partial classification of the data. We use the term "partial classification" to describe the discovery of models that show characteristics of the data classes, but may not cover all classes and all examples of any given class. Complete classification may be infeasible or undesirable when there are a very large number of class attributes, most attributes values are missing, or the class distribution is highly skewed and the user is interested in understanding the lowfrequency class. We show how association rules can be used for partial classification in such domains, and present two case studies: reducing telecommunications order failures and detecting redundant medical tests.
DBMiner: A System for Mining Knowledge in Large Relational Databases
 In Proc. 1996 Int'l Conf. on Data Mining and Knowledge Discovery (KDD'96
, 1996
"... A data mining system, DBMiner, has been developed for interactive mining of multiplelevel knowledge in large relational databases. The system implements a wide spectrum of data mining functions, including generalization, characterization, association, classi# cation, and prediction. By incorp ..."
Abstract

Cited by 53 (12 self)
 Add to MetaCart
A data mining system, DBMiner, has been developed for interactive mining of multiplelevel knowledge in large relational databases. The system implements a wide spectrum of data mining functions, including generalization, characterization, association, classi# cation, and prediction. By incorporating several interesting data mining techniques, including attributeoriented induction, statistical analysis, progressive deepening for mining multiplelevel knowledge, and metarule guided mining, the system provides a userfriendly, interactive data mining environment with good performance.
Logic regression
 Journal of Computational and Graphical Statistics
, 2003
"... Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions ar ..."
Abstract

Cited by 36 (11 self)
 Add to MetaCart
Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two to threeway interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as “X1, X2, X3, and X4 are true, ” or “X5 or X6 but not X7 are true. ” In more speci � c terms: we try to � t regression models of the form g(E[Y]) = b0 + b1L1 + ¢ ¢ ¢ + bnLn, where Lj is any Boolean expression of the predictors. The Lj and bj are estimated simultaneously using a simulated annealing algorithm. This article discusses how to � t logic regression models, how to carry out model selection for these models, and gives some examples.
Parallel Classification for Data Mining on SharedMemory Multiprocessors
, 1998
"... We present parallel algorithms for building decisiontree classifiers on sharedmemory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pi ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We present parallel algorithms for building decisiontree classifiers on sharedmemory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pipelining and dynamic load balancing to yield faster implementations. The task parallel approach uses dynamic subtree partitioning among processors. Our performance evaluation shows that the construction of a decisiontree classifier can be effectively parallelized on an SMP machine with good speedup. 1