Results 1  10
of
158
Combining geometry and combinatorics: a unified approach to sparse signal recovery
, 2008
"... There are two main algorithmic approaches to sparse signal recovery: geometric and combinatorial. The geometric approach starts with a geometric constraint on the measurement matrix Φ and then uses linear programming to decode information about x from Φx. The combinatorial approach constructs Φ an ..."
Abstract

Cited by 157 (14 self)
 Add to MetaCart
There are two main algorithmic approaches to sparse signal recovery: geometric and combinatorial. The geometric approach starts with a geometric constraint on the measurement matrix Φ and then uses linear programming to decode information about x from Φx. The combinatorial approach constructs Φ and a combinatorial decoding algorithm to match. We present a unified approach to these two classes of sparse signal recovery algorithms. The unifying elements are the adjacency matrices of highquality unbalanced expanders. We generalize the notion of Restricted Isometry Property (RIP), crucial to compressed sensing results for signal recovery, from the Euclidean norm to the ℓp norm for p ≈ 1, and then show that unbalanced expanders are essentially equivalent to RIPp matrices. From known deterministic constructions for such matrices, we obtain new deterministic measurement matrix constructions and algorithms for signal recovery which, compared to previous deterministic algorithms, are superior in either the number of measurements or in noise tolerance.
Efficient Computation of Frequent and Topk Elements in Data Streams
 IN ICDT
, 2005
"... We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. For ..."
Abstract

Cited by 71 (7 self)
 Add to MetaCart
We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. For general data distributions, our topk algorithm returns k elements that have roughly the highest frequencies; and it uses limited space for calculating frequent elements. For realistic Zipfian data, the space requirement of the proposed algorithm for solving the exact frequent elements problem decreases dramatically with the parameter of the distribution; and for topk queries, the analysis ensures that only the topk elements, in the correct order, are reported. The experiments, using real and synthetic data sets, show space reductions with no loss in accuracy. Having proved the effectiveness of the proposed approach through both analysis and experiments, we extend it to be able to answer continuous queries about frequent and topk elements. Although the problems of incremental reporting of frequent and topk elements are useful in many applications, to the best of our knowledge, no solution has been proposed.
Resource Sharing in Continuous SlidingWindow Aggregates
, 2004
"... We consider the problem of resource sharing when processing large numbers of continuous queries. We specifically address slidingwindow aggregates over data streams, an important class of continuous operators for which sharing has not been addressed. We present a suite of sharing techniques th ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
(Show Context)
We consider the problem of resource sharing when processing large numbers of continuous queries. We specifically address slidingwindow aggregates over data streams, an important class of continuous operators for which sharing has not been addressed. We present a suite of sharing techniques that cover a wide range of possible scenarios: different classes of aggregation functions (algebraic, distributive, holistic), different window types (timebased, tuplebased, suffix, historical) , and different input models (single stream, multiple substreams). We provide precise theoretical performance guarantees for our techniques, and show their practical effectiveness through experimental study.
Less hashing, same performance: Building a better bloom filter
 In Proc. the 14th Annual European Symposium on Algorithms (ESA 2006
, 2006
"... ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, on ..."
Abstract

Cited by 61 (7 self)
 Add to MetaCart
(Show Context)
ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for
Explicit constructions for compressed sensing of sparse signals
 In Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
, 2008
"... Over the recent years, a new approach for obtaining a succinct approximate representation of ndimensional ..."
Abstract

Cited by 54 (3 self)
 Add to MetaCart
Over the recent years, a new approach for obtaining a succinct approximate representation of ndimensional
Estimating entropy and entropy norm on data streams
 In Proceedings of the 23rd International Symposium on Theoretical Aspects of Computer Science (STACS
, 2006
"... Abstract. We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the “entropy norm ” of an input stream: it is closely related to entropy but is structurally similar to the wellstudi ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the “entropy norm ” of an input stream: it is closely related to entropy but is structurally similar to the wellstudied notion of frequency moments. We give a polylogarithmic space onepass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second group of results are for estimating the empirical entropy of an input stream. We first present a sublinear space onepass algorithm for this problem. For a stream of m items and a given real parameter α, our algorithm uses space�O(m 2α) and provides an approximation of 1/α in the worst case and (1 + ε) in “most ” cases. We then present a twopass polylogarithmic space (1+ε)approximation algorithm. All our algorithms are quite simple. 1
ProgME: Towards Programmable Network MEasurement
, 2007
"... Traffic measurements provide critical input for a wide range of network management applications, including traffic engineering, accounting, and security analysis. Existing measurement tools collect traffic statistics based on some predetermined, inflexible concept of “flows”. They do not have suffic ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
(Show Context)
Traffic measurements provide critical input for a wide range of network management applications, including traffic engineering, accounting, and security analysis. Existing measurement tools collect traffic statistics based on some predetermined, inflexible concept of “flows”. They do not have sufficient builtin intelligence to understand the application requirements or adapt to the traffic conditions. Consequently, they have limited scalability with respect to the large number of flows and the heterogeneity of monitoring applications. We present ProgME, a Programmable MEasurement architecture based on a novel concept of flowset – arbitrary set of flows defined according to application requirements and/or traffic conditions. Through a simple flowset composition language, ProgME can incorporate application requirements, adapt itself to circumvent the challenges on scalability posed by the large number of flows, and achieve a better applicationperceived accuracy. ProgME can analyze and adapt to traffic statistics in realtime. Using sequential hypothesis test, ProgME can achieve fast and scalable heavy hitter identification.
An integrated efficient solution for computing frequent and topk elements in data streams
 ACM TRANS. DATABASE SYST
, 2006
"... We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. For ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. For general data distributions, our topk algorithm returns k elements that have roughly the highest frequencies; and it uses limited space for calculating frequent elements. For realistic Zipfian data, the space requirement of the proposed algorithm for solving the exact frequent elements problem decreases dramatically with the parameter of the distribution; and for topk queries, the analysis ensures that only the topk elements, in the correct order, are reported. The experiments, using real and synthetic data sets, show space reductions with hardly any loss in accuracy. Having proved the effectiveness of the proposed approach through both analysis and experiments, we extend it to be able to answer continuous queries about frequent and topk elements. Although the problems of incremental reporting of frequent and topk elements are useful in many applications, to the best of our knowledge, no solution has been proposed.
Every Microsecond Counts: Tracking FineGrain Latencies with a Lossy Difference Aggregator
"... Many network applications have stringent endtoend latency requirements, including VoIP and interactive video conferencing, automated trading, and highperformance computing—where even microsecond variations may be intolerable. The resulting finegrain measurement demands cannot be met effectively ..."
Abstract

Cited by 33 (11 self)
 Add to MetaCart
(Show Context)
Many network applications have stringent endtoend latency requirements, including VoIP and interactive video conferencing, automated trading, and highperformance computing—where even microsecond variations may be intolerable. The resulting finegrain measurement demands cannot be met effectively by existing technologies, such as SNMP, NetFlow, or active probing. We propose instrumenting routers with a hashbased primitive that we call a Lossy Difference Aggregator (LDA) to measure latencies down to tens of microseconds and losses as infrequent as one in a million. Such measurement can be viewed abstractly as what we refer to as a coordinated streaming problem, which is fundamentally harder than standard streaming problems due to the need to coordinate values between nodes. We describe a compact data structure that efficiently computes the average and standard deviation of latency and loss rate in a coordinated streaming environment. Our theoretical results translate to an efficient hardware implementation at 40 Gbps using less than 1 % of a typical 65nm 400MHz networking ASIC. When compared to Poissonspaced active probing with similar overheads, our LDA mechanism delivers orders of magnitude smaller relative error; active probing requires 50–60 times as much bandwidth to deliver similar levels of accuracy.
The power of slicing in internet flow measurement
 In IMC’05
, 2005
"... Abstract – Network service providers use high speed flow measurement solutions in routers to track dominant applications, compute traffic matrices and to perform other such operational tasks. These solutions typically need to operate within the constraints of the three precious router resources – CP ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
(Show Context)
Abstract – Network service providers use high speed flow measurement solutions in routers to track dominant applications, compute traffic matrices and to perform other such operational tasks. These solutions typically need to operate within the constraints of the three precious router resources – CPU, memory and bandwidth. Cisco’s NetFlow, a widely deployed flow measurement solution, uses a configurable static sampling rate to control these resources. In this paper, we propose Flow Slices, a solution inspired from previous enhancements to NetFlow such as Smart Sampling [8], Adaptive NetFlow (ANF) [10]. Flow Slices, in contrast to NetFlow, controls the three resource bottlenecks at the router using separate “tuning knobs”; it uses packet sampling to control CPU usage, flow sampling to control memory usage and finally multifactor smart sampling to control reporting bandwidth. The resulting solution has smaller resource requirements than current proposals (up to 80 % less memory usage than ANF), enables more accurate traffic analysis results (up to 10 % less error than ANF) and balances better the error in estimates of byte, packet and flow counts (flow count estimates up to 8 times more accurate than after Smart Sampling). We provide theoretical analyses of the unbiasedness and variances of the estimators based on Flow Slices and experimental comparisons with other flow measurement solutions such as ANF. 1