Results 1 - 10
of
41
Tributaries and deltas: Efficient and robust aggregation in sensor network streams
- In SIGMOD
, 2005
"... Existing energy-efficient approaches to in-network aggregation in sensor networks can be classified into two categories, tree-based and multi-path-based, with each having unique strengths and weaknesses. In this paper, we introduce Tributary-Delta, a novel approach that combines the advantages of th ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
Existing energy-efficient approaches to in-network aggregation in sensor networks can be classified into two categories, tree-based and multi-path-based, with each having unique strengths and weaknesses. In this paper, we introduce Tributary-Delta, a novel approach that combines the advantages of the tree and multi-path approaches by running them simultaneously in different regions of the network. We present schemes for adjusting the regions in response to changes in network conditions, and show how many useful aggregates can be readily computed within this new framework. We then show how a difficult aggregate for this context— finding frequent items—can be efficiently computed within the framework. To this end, we devise the first algorithm for frequent items (and for quantiles) that provably minimizes the worst case total communication for non-regular trees. In addition, we give a multi-path algorithm for frequent items that is considerably more accurate than previous approaches. These algorithms form the basis for our efficient Tributary-Delta frequent items algorithm. Through extensive simulation with real-world and synthetic data, we show the significant advantages of our techniques. For example, in computing Count under realistic loss rates, our techniques reduce answer error by up to a factor of 3 compared to any previous technique. 1.
Distributed data mining in peer-to-peer networks
- IEEE Internet Computing special issue on Distributed Data Mining
, 2006
"... Distributed data mining deals with the problem of data analysis in environments with distributed data, computing nodes, and users. Peer-to-peer (P2P) computing is emerging as a new distributed computing paradigm for many novel applications that involve exchange of information among a large number of ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
Distributed data mining deals with the problem of data analysis in environments with distributed data, computing nodes, and users. Peer-to-peer (P2P) computing is emerging as a new distributed computing paradigm for many novel applications that involve exchange of information among a large number of peers with little centralized coordination. P2P file sharing, P2P electronic commerce, and P2P monitoring based on a network of sensors are some examples. This paper offers an overview of distributed data mining applications and algorithms for P2P environments. It describes both exact and approximate distributed data mining algorithms that work in a decentralized manner.
Delay aware querying with Seaweed
- In VLDB
, 2006
"... Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges are scale (10 3 to 10 9 endsystems) and endsystem unavailability. In such large systems, a significant fraction of endsystems, and their data, will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable. We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data. Seaweed is a scalable query infrastructure supporting online aggregation and completeness prediction. Seaweed is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed’s scalability against other approaches and present an evaluation of the Seaweed prototype running on a large-scale network simulator driven by real-world traces. 1.
In-network outlier detection in wireless sensor networks
- In ICDCS
, 2006
"... To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an algorithm that (1) is flexible with respect to the outlier definition, (2) works in-network with a communication load proportional to the outcome, and (3) reveals its outcome to all sensors. We examin ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an algorithm that (1) is flexible with respect to the outlier definition, (2) works in-network with a communication load proportional to the outcome, and (3) reveals its outcome to all sensors. We examine the algorithm’s performance using simulation with real sensor data streams. Our results demonstrate that the algorithm is accurate and imposes a reasonable communication load and level of power consumption. 1.
Network Imprecision: A new consistency metric for scalable monitoring
- IN OSDI
, 2008
"... This paper introduces a new consistency metric, Network Imprecision (NI), to address a central challenge in largescale monitoring systems: safeguarding correctness despite node and network failures. To implement NI, an overlay that monitors a set of attributes also monitors its own state so that que ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper introduces a new consistency metric, Network Imprecision (NI), to address a central challenge in largescale monitoring systems: safeguarding correctness despite node and network failures. To implement NI, an overlay that monitors a set of attributes also monitors its own state so that queries return not only attribute values but also information about the stability of the overlay—the number of nodes whose recent updates may be missing and the number of nodes whose inputs may be double counted due to overlay reconfigurations. When NI indicates that the network is stable, query results reflect the true state of the system, but when the network is unstable, NI puts applications on notice that query results should not be trusted, allowing them to take corrective action such as filtering out inconsistent results. To implement NI’s introspection scalably, our prototype introduces a key optimization, dual-tree prefix aggregation, which exploits overlay symmetry to reduce overheads by more than an order of magnitude. Evaluation of three monitoring applications demonstrates that NI flags inaccurate results while incurring low overheads, and monitoring applications that use NI to select good information can reduce their inaccuracy by nearly a factor of five.
Processing window queries in wireless sensor networks
- Proc. In IEEE ICDE
, 2006
"... The existing query processing techniques for sensor networks rely on a network infrastructure for query propagation and data collection. However, such an infrastructure is very susceptible to network topology transients that widely exist in sensor networks. In this paper, we propose an infrastructur ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The existing query processing techniques for sensor networks rely on a network infrastructure for query propagation and data collection. However, such an infrastructure is very susceptible to network topology transients that widely exist in sensor networks. In this paper, we propose an infrastructure-free window query processing technique for sensor networks, called itinerary-based window query execution (IWQE), in which query propagation and data collection are combined into one single stage and executed along a well-designed itinerary inside a query window. We study the parameters for setting up an itinerary (e.g., width and route) and incorporate into IWQE three data collection schemes based on different performance trade-offs. Finally we demonstrate, by extensive simulations, the superior energy-time efficiency, robustness, and accuracy of IWQE over the current state-of-the-art techniques in supporting window queries under various network conditions. 1
Counting at large: Efficient cardinality estimation in internet-scale data networks
- In Proc. IEEE ICDE
, 2006
"... Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internetscale information systems. Examples of such applications range fr ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internetscale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks. 1
Wide-Scale Data Stream Management
"... This paper describes Mortar, a distributed stream processing platform for building very large queries across federated systems (enterprises, grids, datacenters, testbeds). Nodes in such systems can be queried for distributed debugging, application control and provisioning, anomaly detection, and mea ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper describes Mortar, a distributed stream processing platform for building very large queries across federated systems (enterprises, grids, datacenters, testbeds). Nodes in such systems can be queried for distributed debugging, application control and provisioning, anomaly detection, and measurement. We address the primary challenges of managing continuous queries that have thousands of wide-area sources that may periodically be down, disconnected, or overloaded, e.g., multiple data centers filled with cheap PCs, Internet testbeds such as Planetlab, or country-wide sensor installations. Mortar presents a clean-slate design for best-effort innetwork processing. For each query, it builds multiple, static overlays and leverages the union of overlay paths to provide resilient query installation and data routing. Further, a unique data management scheme mitigates the impact of clock skew on distributed stream processing, reducing result latency by a factor of 8, and allows users to specify custom in-network operators that transparently benefit from multipath routing. When compared to a contemporary distributed snapshot querying substrate, Mortar uses 78 % less bandwidth while providing increased query resolution, responsiveness, and accuracy during failures. 1
Piergiovanni. Looking for a Definition of Dynamic Distributed Systems
- In Springer Verlag LNCS, editor, To appear in Proceeding of 9th International Conference on Parallel Computing Technologies (PaCT’07
"... Abstract. This paper is a position paper on the nature of dynamic systems. While there is an agreement on the definition of what a static distributed system is, there is no agreed definition on what a dynamic distributed system is. This paper is a first step in that direction. To that end, it emphas ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Abstract. This paper is a position paper on the nature of dynamic systems. While there is an agreement on the definition of what a static distributed system is, there is no agreed definition on what a dynamic distributed system is. This paper is a first step in that direction. To that end, it emphasizes two orthogonal dimensions that are present in any dynamic distributed system, namely the varying and possibly very large number of entities that currently define the system, and the fact that each of these entities knows only a few other entities (its neighbors) and possibly will never be able to know the whole system it is a member of. To illustrate the kind of issues one has to cope with in dynamic systems, the paper considers, as a benchmark problem, a simple data aggregation problem. It shows the type of dynamic systems in which that problem can be solved and the ones in which it cannot be solved. The aim of the paper is to give the reader an idea of the subtleties and difficulties encountered when one wants to understand the nature of dynamic distributed systems. 1

