Results 1  10
of
59
Distributed data mining in peertopeer networks
 IEEE Internet Computing special issue on Distributed Data Mining
, 2006
"... Distributed data mining deals with the problem of data analysis in environments with distributed data, computing nodes, and users. Peertopeer (P2P) computing is emerging as a new distributed computing paradigm for many novel applications that involve exchange of information among a large number of ..."
Abstract

Cited by 55 (10 self)
 Add to MetaCart
(Show Context)
Distributed data mining deals with the problem of data analysis in environments with distributed data, computing nodes, and users. Peertopeer (P2P) computing is emerging as a new distributed computing paradigm for many novel applications that involve exchange of information among a large number of peers with little centralized coordination. P2P file sharing, P2P electronic commerce, and P2P monitoring based on a network of sensors are some examples. This paper offers an overview of distributed data mining applications and algorithms for P2P environments. It describes both exact and approximate distributed data mining algorithms that work in a decentralized manner.
Innetwork outlier detection in wireless sensor networks
 In ICDCS
, 2006
"... To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an algorithm that (1) is flexible with respect to the outlier definition, (2) works innetwork with a communication load proportional to the outcome, and (3) reveals its outcome to all sensors. We examin ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
(Show Context)
To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an algorithm that (1) is flexible with respect to the outlier definition, (2) works innetwork with a communication load proportional to the outcome, and (3) reveals its outcome to all sensors. We examine the algorithm’s performance using simulation with real sensor data streams. Our results demonstrate that the algorithm is accurate and imposes a reasonable communication load and level of power consumption. 1.
A local algorithm for ad hoc majority voting via charge fusion
 In Proceedings of the 18th annual conference on distributed computing
, 2004
"... Abstract — We present a local distributed algorithm for a general Majority Voting problem: different and timevariable voting powers and vote splits, arbitrary and dynamic interconnection topologies and link delays, and any fixed majority threshold. The algorithm combines a novel, efficient anytime s ..."
Abstract

Cited by 24 (12 self)
 Add to MetaCart
Abstract — We present a local distributed algorithm for a general Majority Voting problem: different and timevariable voting powers and vote splits, arbitrary and dynamic interconnection topologies and link delays, and any fixed majority threshold. The algorithm combines a novel, efficient anytime spanning forest algorithm, which may also have applications elsewhere, with a “charge fusion ” algorithm that roots trees at nodes with excess “charge ” (derived from a node’s voting power and vote split), and subsequently transfers charges along tree links to oppositely charged roots for fusion. At any instant, every node has an ad hoc belief regarding the outcome. Once all changes have ceased, the correct majority decision is reached by all nodes, within a time that in many cases is independent of the graph size. The algorithm’s correctness and salient properties are proved, and experiments with up to a million nodes provide further validation and actual numbers. To our knowledge, this is the first localitysensitive solution to the Majority Vote problem for arbitrary, dynamically changing communication graphs. A. Background I.
Veracity radius  capturing the locality of distributed computations
 ACM PODC
, 2006
"... This paper focuses on local computations of distributed aggregation problems on fixed graphs. We define a new metric on problem instances, Veracity Radius (VR), which captures the inherent possibility to compute them locally. We prove that VR yields a tight lower bound on outputstabilization time, ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
This paper focuses on local computations of distributed aggregation problems on fixed graphs. We define a new metric on problem instances, Veracity Radius (VR), which captures the inherent possibility to compute them locally. We prove that VR yields a tight lower bound on outputstabilization time, i.e., the time until all nodes fix their outputs, as well as a lower bound on quiescence time. We present an efficient aggregation algorithm, ILEAG, which reaches both output stabilization and quiescence within a time that is proportional to the VR of the problem instance, and is also efficient in terms of pernode communication and memory. We empirically show that the VR metric also effectively captures the performance of previously suggested efficient aggregation protocols, and that ILEAG significantly outperforms these protocols in several respects.
A local facility location algorithm for sensor networks
 In DCOSS ’05
, 2005
"... Abstract. In this paper we address a wellknown facility location problem (FLP) in a sensor network environment. The problem deals with finding the optimal way to provide service to a (possibly) very large number of clients. We show that a variation of the problem can be solved using a local algorit ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we address a wellknown facility location problem (FLP) in a sensor network environment. The problem deals with finding the optimal way to provide service to a (possibly) very large number of clients. We show that a variation of the problem can be solved using a local algorithm. Local algorithms are extremely useful in a sensor network scenario. This is because they allow the communication range of the sensor to be restricted to the minimum, they can operate in routerless networks, and they allow complex problems to be solved on the basis of very little information, gathered from nearby sensors. The local facility location algorithm we describe is entirely asynchronous, seamlessly supports failures and changes in the data during calculation, poses modest memory and computational requirements, and can provide an anytime solution which is guaranteed to converge to the exact same one that would be computed by a centralized algorithm given the entire data. 1
A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems
, 2006
"... In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system’s functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system’s functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the model of the system. Since the state of the system is constantly changing, it is necessary to keep the models uptodate. Computing global data mining models e.g. decision trees, kmeans clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its kmeans clustering. The theoretical claims are corroborated with a thorough experimental analysis.
A Scalable Local Algorithm for Distributed Multivariate Regression
, 2008
"... This paper offers a local distributed algorithm for multivariate regression in large peertopeer environments. The algorithm can be used for distributed inferencing, data compaction, data modeling and classification tasks in many emerging peertopeer applications for bioinformatics, astronomy, soc ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
This paper offers a local distributed algorithm for multivariate regression in large peertopeer environments. The algorithm can be used for distributed inferencing, data compaction, data modeling and classification tasks in many emerging peertopeer applications for bioinformatics, astronomy, social networking, sensor networks and web mining. Computing a global regression model from data available at the different peernodes using a traditional centralized algorithm for regression can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peertopeer networks, and dynamic nature of the data/network. This paper proposes a twostep approach to deal with this problem. First, it offers an efficient local distributed algorithm that monitors the “quality ” of the current regression model. If the model is outdated, it uses this algorithm as a feedback mechanism for rebuilding the model. The local nature of the monitoring algorithm guarantees low monitoring cost. Experimental results presented in this paper strongly support the theoretical claims.
A Local Facility Location Algorithm for LargeScale Distributed Systems
 Journal of Grid Computing
, 2007
"... Abstract. In the facility location problem (FLP) we are given a set of facilities and a set of clients, each of which is to be served by one facility. The goal is to decide which subset of facilities to open, such that the clients will be served at a minimal cost. In this paper we investigate the FL ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract. In the facility location problem (FLP) we are given a set of facilities and a set of clients, each of which is to be served by one facility. The goal is to decide which subset of facilities to open, such that the clients will be served at a minimal cost. In this paper we investigate the FLP in a setting where the cost depends on data known only to peer nodes. This setting typifies modern distributed systems: peertopeer file sharing networks, grid systems, and wireless sensor networks. All of them need to perform network organization, data placement, collective power management, and other tasks of this kind. We propose a local and efficient algorithm that solves FLP in these settings. The algorithm presented here is extremely scalable, entirely decentralized, requires no routing capabilities, and is resilient to failures and changes in the data throughout its execution.
Distributed Decision Tree Induction in PeertoPeer Systems
, 2008
"... This paper offers a scalable and robust distributed algorithm for decision tree induction in large PeertoPeer (P2P) environments. Computing a decision tree in such large distributed systems using standard centralized algorithms can be very communicationexpensive and impractical because of the syn ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
This paper offers a scalable and robust distributed algorithm for decision tree induction in large PeertoPeer (P2P) environments. Computing a decision tree in such large distributed systems using standard centralized algorithms can be very communicationexpensive and impractical because of the synchronization requirements. The problem becomes even more challenging in the distributed stream monitoring scenario where the decision tree needs to be updated in response to changes in the data distribution. This paper presents an alternate solution that works in a completely asynchronous manner in distributed environments and offers low communication overhead, a necessity for scalability. It also seamlessly handles changes in data and peer failures. The paper presents extensive experimental results to corroborate the theoretical claims.
Privacypreserving association rule mining in largescale distributed systems
 In Proceedings of Cluster Computing and the Grid (CCGrid
, 2004
"... Data privacy is a major concern that threatens the widespread deployment of data grids in domains such as healthcare and finance. We propose a unique approach for obtaining knowledge – by way of a data mining model – from a data grid, while ensuring that the data is cryptographically safe. This is ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Data privacy is a major concern that threatens the widespread deployment of data grids in domains such as healthcare and finance. We propose a unique approach for obtaining knowledge – by way of a data mining model – from a data grid, while ensuring that the data is cryptographically safe. This is made possible by an innovative, yet natural generalization for the accepted trusted third party model and a new privacypreserving data mining algorithm that is suitable for gridscale systems. The algorithm is asynchronous, involves no global communication patterns, and dynamically adjusts to changes in the data or to the failure and recovery of resources. To the best of our knowledge, this is the first privacypreserving mining algorithm to possess these features. Simulations of thousands of resources prove that our algorithm quickly converges to the correct result while using reasonable communication. The simulations also prove that the effect of the privacy parameter on both the convergence time and the number of messages, is logarithmic. 1.