Results 1  10
of
23
Algorithms for Distributed Functional Monitoring
, 2008
"... We study what we call functional monitoring problems. We have k players each tracking their inputs, say player i tracking a multiset Ai(t) up until time t, and communicating with a central coordinator. The coordinator’s task is to monitor a given function f computed over the union of the inputs ∪iAi ..."
Abstract

Cited by 40 (12 self)
 Add to MetaCart
We study what we call functional monitoring problems. We have k players each tracking their inputs, say player i tracking a multiset Ai(t) up until time t, and communicating with a central coordinator. The coordinator’s task is to monitor a given function f computed over the union of the inputs ∪iAi(t), continuously at all times t. The goal is to minimize the number of bits communicated between the players and the coordinator. A simple example is when f is the sum, and the coordinator is required to alert when the sum of a distributed set of values exceeds a given threshold τ. Of interest is the approximate version where the coordinator outputs 1 if f ≥ τ and 0 if f ≤ (1 − ɛ)τ. This defines the (k, f, τ, ɛ) distributed, functional monitoring problem. Functional monitoring problems are fundamental in distributed systems, in particular sensor networks, where we must minimize communication; they also connect to problems in communication complexity, communication theory, and signal processing. Yet few formal bounds are known for functional monitoring. We give upper and lower bounds for the (k, f, τ, ɛ) problem for some of the basic f’s. In particular, we study frequency moments (F0, F1, F2). For F0 and F1, we obtain continuously monitoring algorithms with costs almost the same as their oneshot computation algorithms. However, for F2 the monitoring problem seems much harder. We give a carefully constructed multiround algorithm that uses “sketch summaries ” at multiple levels of detail and solves the (k, F2, τ, ɛ) problem with communication Õ(k2 /ɛ+ ( √ k/ɛ) 3). Since frequency moment estimation is central to other problems, our results have immediate applications to histograms, wavelet computations, and others. Our algorithmic techniques are likely to be useful for other functional monitoring problems as well.
Issues in Evaluation of Stream Learning Algorithms
"... Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resourceaware environments, detect and react to changes in the environment generati ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resourceaware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in nonstationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate – the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i) assessing performance of a learning algorithm; ii) comparing learning algorithms; iii) hypothesis testing using McNemar test; and iv) change detection using PageHinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memoryless, a requirement for streaming applications. This paper is a contribution to a discussion in the goodpractices on performance assessment when learning dynamic models that evolve over time. Categories and Subject Descriptors H.2.8 [Database Management]: Database applications— data mining; I.2.6 [Artificial Intelligence]: Learning—
Optimal tracking of distributed heavy hitters and quantiles
 In PODS
, 2009
"... We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1,..., u}. For a given 0 ≤ φ ≤ 1, ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1,..., u}. For a given 0 ≤ φ ≤ 1, the φheavy hitters are those elements of A whose frequency in A is at least φA; the φquantile of A is an element x of U such that at most φA  elements of A are smaller than A and at most (1 − φ)A  elements of A are greater than x. Suppose the elements of A are received at k remote sites over time, and each of the sites has a twoway communication channel to a designated coordinator, whose goal is to track the set of φheavy hitters and the φquantile of A approximately at all times with minimum communication. We give tracking algorithms with worstcase communication cost O(k/ǫ · log n) for both problems, where n is the total number of items in A, and ǫ is the approximation error. This substantially improves upon the previous known algorithms. We also give matching lower bounds on the communication costs for both problems, showing that our algorithms are optimal. We also consider a more general version of the problem where we simultaneously track the φquantiles for all 0 ≤ φ ≤ 1. 1
Shape sensitive geometric monitoring
 In Proc. ACM Symposium on Principles of Database Systems
, 2008
"... A fundamental problem in distributed computation is the distributed evaluation of functions. The goal is to determine the value of a function over a set of distributed inputs, in a communication efficient manner. Specifically, we assume that each node holds a time varying input vector, and we are in ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
A fundamental problem in distributed computation is the distributed evaluation of functions. The goal is to determine the value of a function over a set of distributed inputs, in a communication efficient manner. Specifically, we assume that each node holds a time varying input vector, and we are interested in determining, at any given time, whether the value of an arbitrary function on the average of these vectors crosses a predetermined threshold. In this paper, we introduce a new method for monitoring distributed data, which we term shape sensitive geometric monitoring. It is based on a geometric interpretation of the problem, which enables to define local constraints on the data received at the nodes. It is guaranteed that as long as none of these constraints has been violated, the value of the function does not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local input vectors, here we incorporate their temporal behavior into the constraints. Also, the new constraints are tailored to the geometric properties of the specific function which is being monitored, while the previous constraints were generic. Experimental results on real world data reveal that using the new geometric constraints reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between existing results and a newly defined lower bound on the communication complexity.
Compressing kinetic data from sensor networks
, 2009
"... We introduce a framework for storing and processing kinetic data observed by sensor networks. These sensor networks generate vast quantities of data, which motivates a significant need for data compression. We are given a set of sensors, each of which continuously monitors some region of space. We ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We introduce a framework for storing and processing kinetic data observed by sensor networks. These sensor networks generate vast quantities of data, which motivates a significant need for data compression. We are given a set of sensors, each of which continuously monitors some region of space. We are interested in the kinetic data generated by a finite set of objects moving through space, as observed by these sensors. Our model relies purely on sensor observations; it allows points to move freely and requires no advance notification of motion plans. Sensor outputs are represented as random processes, where nearby sensors may be statistically dependent. We model the local nature of sensor networks by assuming that two sensor outputs are statistically dependent only if the two sensors are among the k nearest neighbors of each other. We present an algorithm for the lossless compression of the data produced by the network. We show that, under the statistical dependence and locality assumptions of our framework, asymptotically this compression algorithm encodes the data to within a constant factor of the informationtheoretic lower bound optimum dictated by the joint entropy of the system.
Predictionbased Geometric Monitoring over Distributed Data Streams
 In ACM SIGMOD
, 2012
"... Many modern streaming applications, such as online analysis of financial, network, sensor and other forms of data are inherently distributed in nature. An important query type that is the focal point in such application scenarios regards actuation queries, where proper action is dictated based on a ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Many modern streaming applications, such as online analysis of financial, network, sensor and other forms of data are inherently distributed in nature. An important query type that is the focal point in such application scenarios regards actuation queries, where proper action is dictated based on a trigger condition placed upon the current value that a monitored function receives. Recent work [18, 20, 21] studies the problem of (nonlinear) sophisticated function tracking in a distributed manner. The main concept behind the geometric monitoring approach proposed there, is for each distributed site to perform the function monitoring over an appropriate subset of the input domain. In the current work, we examine whether the distributed monitoring mechanism can become more efficient, in terms of the number of communicated messages, by extending the geometric monitoring framework to utilize prediction models.
Continuous Adaptive Outlier Detection on Distributed Data Streams
"... Abstract. In many applications, stream data are too voluminous to be collected in a central fashion and often transmitted on a distributed network. In this paper, we focus on the outlier detection over distributed data streams in real time, firstly, we formalize the problem of outlier detection usin ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. In many applications, stream data are too voluminous to be collected in a central fashion and often transmitted on a distributed network. In this paper, we focus on the outlier detection over distributed data streams in real time, firstly, we formalize the problem of outlier detection using the kernel density estimation technique. Then, we adopt the fading strategy to keep pace with the transient and evolving natures of stream data, and micocluster technique to conquer the data partition and “onepass ” scan. Furthermore, our extensive experiments with synthetic and real data show that the proposed algorithm is efficient and effective compared with existing outlier detection algorithms, and more suitable for data streams. 1
Multiscale realtime grid monitoring with job stream mining
 in 9th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2009
"... Abstract—The ever increasing scale and complexity of large computational systems ask for sophisticated management tools, paving the way toward Autonomic Computing. A first step toward Autonomic Grids is presented in this paper; the interactions between the grid middleware and the stream of computati ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—The ever increasing scale and complexity of large computational systems ask for sophisticated management tools, paving the way toward Autonomic Computing. A first step toward Autonomic Grids is presented in this paper; the interactions between the grid middleware and the stream of computational queries are modeled using statistical learning. The approach is implemented and validated in the context of the EGEE grid. The GSTRAP system, embedding the STRAP Data Streaming algorithm, provides manageable and understandable views of the computational workload based on gLite reporting services. An online monitoring module shows the instant distribution of the jobs in realtime and its dynamics, enabling anomaly detection. An offline monitoring module provides the administrator with a consolidated view of the workload, enabling the visual inspection of its longterm trends. I.
The Continuous Distributed Monitoring Model
, 2013
"... In the model of continuous distributed monitoring, a number of observers each see a stream of observations. Their goal is to work together to compute a function of the union of their observations. This can be as simple as counting the total number of observations, or more complex nonlinear function ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In the model of continuous distributed monitoring, a number of observers each see a stream of observations. Their goal is to work together to compute a function of the union of their observations. This can be as simple as counting the total number of observations, or more complex nonlinear functions such as tracking the entropy of the induced distribution. Assuming that it is too costly to simply centralize all the observations, it becomes quite challenging to design solutions which provide a good approximation to the current answer, while bounding the communication cost of the observers, and their other resources such as their space usage. This survey introduces this model, and describe a selection results in this setting, from the simple counting problem to a variety of other functions that have been studied.
Towards an Algorithmic Theory
 of Compressed Sensing, Rutgers Univ., 2005, Tech. Rep
"... In the model of continuous distributed monitoring, a number of observers each see a stream of observations. Their goal is to work together to compute a function of the union of their observations. This can be as simple as counting the total number of observations, or more complex nonlinear function ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In the model of continuous distributed monitoring, a number of observers each see a stream of observations. Their goal is to work together to compute a function of the union of their observations. This can be as simple as counting the total number of observations, or more complex nonlinear functions such as tracking the entropy of the induced distribution. Assuming that it is too costly to simply centralize all the observations, it becomes quite challenging to design solutions which provide a good approximation to the current answer, while bounding the communication cost of the observers, and their other resources such as their space usage. This survey introduces this model, and describe a selection results in this setting, from the simple counting problem to a variety of other functions that have been studied.