Results 1  10
of
233
Scalable, Distributed Data Mining Using An Agent Based Architecture
 Proceedings the Third International Conference on the Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California
, 1997
"... : Algorithm scalability and the distributed nature of both data and computation deserve serious attention in the context of data mining. This paper presents PADMA (PArallel Data Mining Agents), a parallel agent based system, that makes an effort to address these issues. PADMA contains modules for (1 ..."
Abstract

Cited by 54 (7 self)
 Add to MetaCart
for (1) parallel data accessing operations, (2) parallel hierarchical clustering, and (3) webbased data visualization. This paper describes the general architecture of PADMA and experimental results. Scalable, Distributed Data Mining Using An Agent Based Architecture Hillol Kargupta, Ilker Hamzaoglu
On the privacy preserving properties of random data perturbation techniques
 In ICDM
, 2003
"... Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacypreserving data mining techniques. A large fraction of them use randomized data distortion techniques to mask the data for preserving the privacy of sensitive data ..."
Abstract

Cited by 188 (6 self)
 Add to MetaCart
Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacypreserving data mining techniques. A large fraction of them use randomized data distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random value distortion technique in privacy preservation. The paper notes that random objects (particularly random matrices) have “predictable ” structures in the spectral domain and it develops a random matrixbased spectral filtering technique to retrieve original data from the dataset distorted by adding random values. The paper presents the theoretical foundation of this filtering method and extensive experimental results to demonstrate that in many cases random data distortion preserve very little data privacy. 1.
Computational processes in evolution and the gene expression messy genetic algorithm
 Annual Meeting of the Society for Industrial and Applied Mathematics (SIAM)
, 1996
"... This paper makes an effort to project the theoretical lessons of the SEARCH (Search Envisioned As Relation and Class Hierarchizing) framework introduced elsewhere (Kargupta, 1995; Kargupta & Goldberg, 1996) in the context of natural evolution (Kargupta, 1996c) and introduce the gene expression m ..."
Abstract
 Add to MetaCart
This paper makes an effort to project the theoretical lessons of the SEARCH (Search Envisioned As Relation and Class Hierarchizing) framework introduced elsewhere (Kargupta, 1995; Kargupta & Goldberg, 1996) in the context of natural evolution (Kargupta, 1996c) and introduce the gene expression
Rapid, Accurate Optimization of Difficult Problems Using Fast Messy Genetic Algorithms
 Proceedings of the Fifth International Conference on Genetic Algorithms
, 1993
"... Researchers have long sought genetic algorithms (GAs) that can solve difficult search, optimization, and machine learning problems quickly. Despite years of work on simple GAs and their variants it is still unknown how difficult a problem simple GAs can solve, how quickly they can solve it, and with ..."
Abstract

Cited by 118 (24 self)
 Add to MetaCart
Researchers have long sought genetic algorithms (GAs) that can solve difficult search, optimization, and machine learning problems quickly. Despite years of work on simple GAs and their variants it is still unknown how difficult a problem simple GAs can solve, how quickly they can solve it, and with what reliability. More radical design departures than these have been taken, however, and the messy GA (mGA) approach has attempted to solve problems of bounded difficulty quickly and reliably by taking the notion of buildingblock linkage quite seriously. Early efforts were apparently successful in achieving polynomial convergence on some difficult problems, but the initialization bottleneck that required a large initial population was thought to be the primary obstacle to faster mGA performance. This paper replaces the partially enumerative initialization and selective primordial phase of the original messy GA with probabilistically complete initialization and a primordial phase that per...
Professional publications: Refereed Journals
"... [1] K. Liu, C. Giannella, and H. Kargupta, “An attacker’s view of exact and approximate distance preserving perturbations for privacy preserving data mining,” ..."
Abstract
 Add to MetaCart
[1] K. Liu, C. Giannella, and H. Kargupta, “An attacker’s view of exact and approximate distance preserving perturbations for privacy preserving data mining,”
The Gene Expression Messy Genetic Algorithm
 In Proceedings of the IEEE International Conference on Evolutionary Computation
, 1996
"... This paper introduces the gene expression messy genetic algorithm (GEMGA) a new generation of messy GAs that directly search for relations among the members of the search space. The GEMGA is an O(Ak( 2 q k)) sam ple complexity algorithm for the class of orderk deline able problems [6] (problems ..."
Abstract

Cited by 102 (8 self)
 Add to MetaCart
This paper introduces the gene expression messy genetic algorithm (GEMGA) a new generation of messy GAs that directly search for relations among the members of the search space. The GEMGA is an O(Ak( 2 q k)) sam ple complexity algorithm for the class of orderk deline able problems [6] (problems that can be solved by considering no higher than orderk relations). The GEMGA is designed based on an alternate perspective of natural evo lution proposed by the SEARCH framework [6] that em phasizes the role of gene expression. The GEMGA uses the transcription operator to search for relations. This paper also presents the test results of the GEMGA for large multimodal orderk delineable problems.
Collective Data Mining: A New Perspective Toward Distributed Data Analysis
 Advances in Distributed and Parallel Knowledge Discovery
, 1999
"... This paper introduces the collective data mining (CDM) framework, a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may result in ambiguous or incorrect global data models. It ..."
Abstract

Cited by 104 (15 self)
 Add to MetaCart
This paper introduces the collective data mining (CDM) framework, a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may result in ambiguous or incorrect global data models. It also notes that any function can be expressed in a distributed fashion using a set of appropriate basis functions and orthogonal basis functions can be eectively used for developing a general DDM framework that guarantees correct local analysis and correct aggregation of local data models with minimal data communication. This paper develops the foundation of CDM, discusses decision tree learning and polynomial regression in CDM for discrete and continuous variables, and describes the BODHI, a CDMbased experimental system for distributed knowledge discovery. 1 Introduction Distributed data mining (DDM) is a fast growing area that deals with the problem of nding data patterns in a...
Random projectionbased multiplicative data perturbation for privacy preserving distributed data mining
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2006
"... This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matri ..."
Abstract

Cited by 94 (6 self)
 Add to MetaCart
This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matrix from distributed privacy sensitive data possibly owned by multiple parties. This class of problems is directly related to many other datamining problems such as clustering, principal component analysis, and classification. This paper makes primary contributions on two different grounds. First, it explores Independent Component Analysis as a possible tool for breaching privacy in deterministic multiplicative perturbationbased models such as random orthogonal transformation and random rotation. Then, it proposes an approximate random projectionbased technique to improve the level of privacy protection while still preserving certain statistical characteristics of the data. The paper presents extensive theoretical analysis and experimental results. Experiments demonstrate that the proposed technique is effective and can be successfully used for different types of privacypreserving data mining applications.
Bayesian Optimization Algorithm: From Single Level to Hierarchy
, 2002
"... There are four primary goals of this dissertation. First, design a competent optimization algorithm capable of learning and exploiting appropriate problem decomposition by sampling and evaluating candidate solutions. Second, extend the proposed algorithm to enable the use of hierarchical decompositi ..."
Abstract

Cited by 101 (19 self)
 Add to MetaCart
There are four primary goals of this dissertation. First, design a competent optimization algorithm capable of learning and exploiting appropriate problem decomposition by sampling and evaluating candidate solutions. Second, extend the proposed algorithm to enable the use of hierarchical decomposition as opposed to decomposition on only a single level. Third, design a class of difficult hierarchical problems that can be used to test the algorithms that attempt to exploit hierarchical decomposition. Fourth, test the developed algorithms on the designed class of problems and several realworld applications. The dissertation proposes the Bayesian optimization algorithm (BOA), which uses Bayesian networks to model the promising solutions found so far and sample new candidate solutions. BOA is theoretically and empirically shown to be capable of both learning a proper decomposition of the problem and exploiting the learned decomposition to ensure robust and scalable search for the optimum across a wide range of problems. The dissertation then identifies important features that must be incorporated into the basic BOA to solve problems that are not decomposable on a single level, but that can still be solved by decomposition over multiple levels of difficulty. Hierarchical
Blackbox Optimization: Implications Of SEARCH
 In communication. Submitted in International Journal of Foundation of Computer Science
, 1996
"... The SEARCH (Search Envisioned As Relation & Class Hierarchizing) framework developed elsewhere (Kargupta, 1995; Kargupta & Goldberg, 1996a; Kargupta & Goldberg, 1996b) offered an alternate perspective toward blackbox optimizationoptimization in presence of little domain knowledge. The ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The SEARCH (Search Envisioned As Relation & Class Hierarchizing) framework developed elsewhere (Kargupta, 1995; Kargupta & Goldberg, 1996a; Kargupta & Goldberg, 1996b) offered an alternate perspective toward blackbox optimizationoptimization in presence of little domain knowledge
Results 1  10
of
233