Results 1 -
6 of
6
Collective Data Mining: A New Perspective Toward Distributed Data Analysis
- Advances in Distributed and Parallel Knowledge Discovery
, 1999
"... This paper introduces the collective data mining (CDM) framework, a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may result in ambiguous or incorrect global data models. It ..."
Abstract
-
Cited by 75 (12 self)
- Add to MetaCart
This paper introduces the collective data mining (CDM) framework, a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may result in ambiguous or incorrect global data models. It also notes that any function can be expressed in a distributed fashion using a set of appropriate basis functions and orthogonal basis functions can be eectively used for developing a general DDM framework that guarantees correct local analysis and correct aggregation of local data models with minimal data communication. This paper develops the foundation of CDM, discusses decision tree learning and polynomial regression in CDM for discrete and continuous variables, and describes the BODHI, a CDM-based experimental system for distributed knowledge discovery. 1 Introduction Distributed data mining (DDM) is a fast growing area that deals with the problem of nding data patterns in a...
Distributed Data Mining: Algorithms, Systems, and Applications
, 2002
"... This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subs ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subsequently, the architectural issues in DDM systems and future directions are discussed
Distributed Multivariate Regression Using Wavelet-based Collective Data Mining
- Journal of Parallel and Distributed Computing
, 1999
"... This paper presents a method for distributed multivariate regression using wavelet-based Collective Data Mining (CDM). The method seamlessly blends machine learning and information theory with the statistical methods employed in multivariate regression to provide an effective data mining technique f ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
This paper presents a method for distributed multivariate regression using wavelet-based Collective Data Mining (CDM). The method seamlessly blends machine learning and information theory with the statistical methods employed in multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. Evaluation of the method in terms of model accuracy as a function of appropriateness of the selected wavelet function, relative number of non-linear cross-terms, and sample size demonstrates that accurate multivariate regression models can be generated from distributed, heterogeneous, data sets with minimal data communication overhead compared to that required to aggregate a centralized data set. Application of this method to Linear Discriminant Analysis, which is closely related to multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis. 1 Intr...
A Fourier analysis based approach to learning decision trees in a distributed environment
- In Procceedings of the SIAM international conference on data mining
, 2001
"... Spurred by advances in communication technologies, mobile computing and databases that are distributed have become widespread. Such a computing environment involves data that is stored at geographically dispersed locations, and the so-called “slim ” computing devices such as palmtops and wearable co ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Spurred by advances in communication technologies, mobile computing and databases that are distributed have become widespread. Such a computing environment involves data that is stored at geographically dispersed locations, and the so-called “slim ” computing devices such as palmtops and wearable computers. The decentralized nature of data storage and this new paradigm in computing give rise to several issues, such as security, communication overhead, computational load demands and scalability, that are not adequately addressed by traditional centralized data mining techniques. It is essential that algorithms designed for distributed data mining scenarios mitigate some of these issues. This paper attempts to adapt one centralized data mining technique, decision tree learning, to such an environment. It presents a scalable algorithm that can be used to build decision trees from a distributed, heterogeneous database while minimizing communication overheads. This paper also shows how a decision tree may be represented in terms of its Fourier spectrum. It uses this Fourier spectrum based technique to aggregate decision trees built at the various distributed sites, simplifying the model built during the data mining stage, and notes some additional advantages of the Fourier spectrum approach.
Knowledge Discovery From Distributed And Textual Data
- Hong Kong University of Science and Technology
, 1999
"... xvi 1) ..."
References
, 2008
"... Advances in computing and communication over wired and wireless networks have resulted in many pervasive distributed computing environments. Many of these environments deal with different distributed sources of voluminous data, multiple compute nodes, and distributed user community. Analyzing and mo ..."
Abstract
- Add to MetaCart
Advances in computing and communication over wired and wireless networks have resulted in many pervasive distributed computing environments. Many of these environments deal with different distributed sources of voluminous data, multiple compute nodes, and distributed user community. Analyzing and monitoring these distributed data sources require a data mining technology designed for distributed applications. The field of distributed data mining (DDM) deals with this problem—mining distributed data by paying careful attention to the distributed resources. The goal of this paper is to maintain and distribute a bibliography of DDM-related publications. We hope that DDM researchers and practitioners find this service useful. We welcome every help from the community in maintaining the bibliography.

