Results 1  10
of
11
Approximationtolerant modelbased compressive sensing
 in ACM Symp. Discrete Algorithms
, 2014
"... The goal of sparse recovery is to recover a ksparse signal x ∈ Rn from (possibly noisy) linear measurements of the form y = Ax, where A ∈ Rm×n describes the measurement process. Standard results in compressive sensing show that it is possible to recover the signal x from m = O(k log(n/k)) measureme ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The goal of sparse recovery is to recover a ksparse signal x ∈ Rn from (possibly noisy) linear measurements of the form y = Ax, where A ∈ Rm×n describes the measurement process. Standard results in compressive sensing show that it is possible to recover the signal x from m = O(k log(n/k)) measurements, and that this bound is tight. The framework of modelbased compressive sensing [BCDH10] overcomes the lower bound and reduces the number of measurements further to O(k) by limiting the supports of x to a subsetM of the (nk) possible supports. This has led to many measurementefficient algorithms for a wide variety of signal models, including blocksparsity and treesparsity. Unfortunately, extending the framework to other, more general models has been stymied by the following obstacle: for the framework to apply, one needs an algorithm that, given a signal x, solves the following optimization problem exactly: arg min Ω∈M ‖x[n]\Ω‖2 (here x[n]\Ω denotes the projection of x on coordinates not in Ω). However, an approximation algorithm for this optimization task is not sufficient. Since many problems of this form are not known to have exact polynomialtime algorithms, this requirement poses an obstacle for extending the framework to a richer class of models. In this paper, we remove this obstacle and show how to extend the modelbased compressive sensing framework so that it requires only approximate solutions to the aforementioned optimization problems. Interestingly, our extension requires the existence of approximation algorithms for both the maximization and the minimization variants of the optimization problem. Further, we apply our framework to the Constrained Earth Mover’s Distance (CEMD) model introduced in [SHI13], obtaining a sparse recovery scheme that uses significantly less than O(k log(n/k)) measurements. This is the first nontrivial theoretical bound for this model, since the validation of the approach presented in [SHI13] was purely empirical. The result is obtained by designing a novel approximation algorithm for the maximization version of the problem and proving approximation guarantees for the minimization algorithm described in [SHI13]. 1
On modelbased RIP1 matrices
 In Proceedings of the 40th International Colloquium on Automata, Languages and Programming (ICALP
, 2013
"... ar ..."
(Show Context)
MODELBASED SKETCHING AND RECOVERY WITH EXPANDERS
"... Abstract. It is well known that sparse signals can be succinctly represented by certain lowdimensional linear sketches with applications in compressive sensing, data streaming and graphsketching, among others. Recently, structured sparsity has emerged as a promising new tool for reducing sketch si ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract. It is well known that sparse signals can be succinctly represented by certain lowdimensional linear sketches with applications in compressive sensing, data streaming and graphsketching, among others. Recently, structured sparsity has emerged as a promising new tool for reducing sketch size and improving recovery. By structured sparsity, we mean that the sparse coefficients exhibit further correlations as determined by a model. Existing work on sketching structured sparse signals requires dense sketching matrices that satisfy the 2norm restricted isometry property. On the other hand, sparse sketching matrices, usually from expanders, are computationally much more efficient, easier to store and apply in recovery. In this paper, we focus on modelbased expanders, that is expanders that capture a given structure sparsity model, and show that they exist for a larger class of models than previously considered. We present the first polynomial time algorithm for recovering structured sparse signals from lowdimensional linear sketches obtained via sparse matrices. The algorithm is guaranteed to yield signals with bounded recovery error and is quite easy to implement and customize for structured sparse models that are endowed with a “projection ” operator. As a result, we characterize a broad class of structured sparsity models that have polynomial time projection property. We also provide numerical experiments to illustrate the theoretical results in action.
The Constrained Earth Mover Distance Model, with Applications to Compressive Sensing
"... Sparse signal representations have emerged as powerful tools in signal processing theory and applications, and serve as the basis of the nowpopular field of compressive sensing (CS). However, several practical signal ensembles exhibit additional, richer structure beyond mere sparsity. Our particul ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Sparse signal representations have emerged as powerful tools in signal processing theory and applications, and serve as the basis of the nowpopular field of compressive sensing (CS). However, several practical signal ensembles exhibit additional, richer structure beyond mere sparsity. Our particular focus in this paper is on signals and images where, owing to physical constraints, the positions of the nonzero coefficients do not change significantly as a function of spatial (or temporal) location. Such signal and image classes are often encountered in seismic exploration, astronomical sensing, and biological imaging. Our contributions are threefold: (i) We propose a simple, deterministic model based on the Earth Mover Distance that effectively captures the structure of the sparse nonzeros of signals belonging to such classes. (ii) We formulate an approach for approximating any arbitrary signal by a signal belonging to our model. The key idea in our approach is a mincost maxflow graph optimization problem that can be solved efficiently in polynomial time. (iii) We develop a CS algorithm for efficiently reconstructing signals belonging to our model, and numerically demonstrate its benefits over stateoftheart CS approaches .
Sketching EarthMover Distance on Graph Metrics ⋆
"... Abstract. We develop linear sketches for estimating the EarthMover distance between two point sets, i.e., the cost of the minimum weight matching between the points according to some metric. While Euclidean distance and Edit distance are natural measures for vectors and strings respectively, Earth ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We develop linear sketches for estimating the EarthMover distance between two point sets, i.e., the cost of the minimum weight matching between the points according to some metric. While Euclidean distance and Edit distance are natural measures for vectors and strings respectively, EarthMover distance is a wellstudied measure that is natural in the context of visual or metric data. Our work considers the case where the points are located at the nodes of an implicit graph and define the distance between two points as the length of the shortest path between these points. We first improve and simplify an existing result by Brody et al. [4] for the case where the graph is a cycle. We then generalize our results to arbitrary graph metrics. Our approach is to recast the problem of estimating EarthMover distance in terms of an ℓ1 regression problem. The resulting linear sketches also yield spaceefficient data stream algorithms in the usual way. 1
ABSTRACT COMPRESSIVE PARAMETER ESTIMATION WITH EMD
, 2014
"... In recent years, sparsity and compressive sensing have attracted significant attention in parameter estimation tasks, including frequency estimation, delay estimation, and localization. Parametric dictionaries collect signals for a sampling of the parameter space and can yield sparse representatio ..."
Abstract
 Add to MetaCart
(Show Context)
In recent years, sparsity and compressive sensing have attracted significant attention in parameter estimation tasks, including frequency estimation, delay estimation, and localization. Parametric dictionaries collect signals for a sampling of the parameter space and can yield sparse representations for the signals of interest when the sampling is sufficiently dense. While this dense sampling can lead to high coherence in the dictionary, it is possible to leverage structured sparsity models to prevent highly coherent dictionary elements from appearing simultaneously in a signal representation, alleviating these coherence issues. However, the resulting approaches depend heavily on a careful setting of the maximum allowable coherence; furthermore, their guarantees apply to the coefficient vector recovery and do not translate in general to the parameter estimation task. We propose a new algorithm based on optimal sparse approximation measured by earth mover’s distance (EMD). Theoretically, we show that EMD provides a better metric for the performance of parametric dictionarybased parameter estimation and Kmedian clustering algorithms has the potential iii
EPMEANS: An Efficient Nonparametric Clustering of Empirical Probability Distributions
"... ABSTRACT Given a collection of m continuousvalued, onedimensional empirical probability distributions {P1, . . . , Pm}, how can we cluster these distributions efficiently with a nonparametric approach? Such problems arise in many realworld settings where keeping the moments of the distribution i ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Given a collection of m continuousvalued, onedimensional empirical probability distributions {P1, . . . , Pm}, how can we cluster these distributions efficiently with a nonparametric approach? Such problems arise in many realworld settings where keeping the moments of the distribution is not appropriate, because either some of the moments are not defined or the distributions are heavytailed or bimodal. Examples include mining distributions of interarrival times and phonecall lengths. We present an efficient algorithm with a nonparametric model for clustering empirical, onedimensional, continuous probability distributions. Our algorithm, called epmeans, is based on the Earth Mover's Distance and kmeans clustering. We illustrate the utility of epmeans on various data sets and applications. In particular, we demonstrate that epmeans effectively and efficiently clusters probability distributions of mixed and arbitrary shapes, recovering groundtruth clusters exactly in cases where existing methods perform at baseline accuracy. We also demonstrate that epmeans outperforms momentbased classification techniques and discovers useful patterns in a variety of realworld applications.
Statement of Research
, 2012
"... My goal in research is to discover theoretical insights that can guide practitioners in the creation of useful systems. To this end, I try to focus on relatively simple algorithms that are feasible to implement and have small bigOh constants; when finding lower bounds, I look for ones that give gui ..."
Abstract
 Add to MetaCart
My goal in research is to discover theoretical insights that can guide practitioners in the creation of useful systems. To this end, I try to focus on relatively simple algorithms that are feasible to implement and have small bigOh constants; when finding lower bounds, I look for ones that give guidance in the creation of efficient algorithms. To calibrate my understanding of the relation between theory and practice, I implement about half my algorithms and analyze their empirical performance. With this in mind, I have decided to focus my research on the interdisciplinary area of sparse recovery, which includes aspects of compressive sensing and streaming algorithms. The goal of sparse recovery is to acquire and process “sparse ” data from a small number of samples. The topic offers the opportunity to develop theoretical and mathematical techniques that apply to big data problems in diverse areas such as data stream analysis and signal processing. On the theory side, sparse recovery involves fascinating techniques from algorithms, statistics, probability theory, and information theory. On the application side, sparse recovery lets me interact with researchers that understand the practical constraints involved in creating real systems. It is a great area to develop theoretical results that can make an impact in practice. My first paper in the area shows fundamental limitations on the “standard ” compressive sensing framework, by showing that the number of samples required by the seminal work of Candès, Romberg, and Tao is in fact optimal. This led me to investigate two methods for circumventing these lower bounds: making the sampling process adaptive and incorporating additional structural assumptions on the signals. I have shown that adaptivity enables a significant—and in some cases exponential—reduction in the number of samples required for sparse recovery. I have also shown the first linear time algorithm to exploit one of the most common additional structural assumptions. My research on algorithms that are highly efficient in both number of samples and processing time has culminated in algorithms to compute the Fourier transform efficiently when its output is sparse. These algorithms are faster than the ubiquitous Fast Fourier Transform for moderately sparse data, both in theory and in practice.
Clustering on Sliding Windows in Polylogarithmic Space
, 2015
"... In PODS 2003, Babcock, Datar, Motwani and O’Callaghan [4] gave the first streaming solution for the kmedian problem on sliding windows usingO ( k τ4 W2τ log2W) space, with aO(2O(1/τ)) approximation factor, whereW is the window size and τ ∈ (0, 1 2) is a userspecified parameter. They left as an op ..."
Abstract
 Add to MetaCart
(Show Context)
In PODS 2003, Babcock, Datar, Motwani and O’Callaghan [4] gave the first streaming solution for the kmedian problem on sliding windows usingO ( k τ4 W2τ log2W) space, with aO(2O(1/τ)) approximation factor, whereW is the window size and τ ∈ (0, 1 2) is a userspecified parameter. They left as an open question whether it is possible to improve this to polylogarithmic space. Despite much progress on clustering and sliding windows, this question has remained open for more than a decade. In this paper, we partially answer the main open question posed by Babcock, Datar, Motwani and O’Callaghan. We present an algorithm yielding an exponential improvement in space compared to the previous result given in Babcock, et al. In particular, we give the first polylogarithmic space (α, β)approximation for metric kmedian clustering in the sliding window model, where α and β are constants, under the assumption, also made by Babcock et al., that the optimal kmedian cost on any given window is bounded by a polynomial in the window size. We justify this assumption by showing that when the cost is exponential in the window size, no sublinear space approximation is possible. Our main technical contribution is a simple but elegant extension of smooth functions as introduced by Braverman and Ostrovsky [7], which allows us to apply wellknown techniques for solving problems in the sliding window model to functions that are not smooth, such as the kmedian cost. 1
Clustering Problems on Sliding Windows
, 2015
"... We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space O(1)approximation to the metric kmedian and metric kmeans problems in the sliding window model, answering the main open problem posed ..."
Abstract
 Add to MetaCart
We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space O(1)approximation to the metric kmedian and metric kmeans problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O’Callaghan [5], which has remained unanswered for over a decade. Our algorithm uses O(k3 log6 n) space and poly(k, logn) update time. This is an exponential improvement on the space required by the technique due to Babcock, et al. We introduce a data structure that extends smooth histograms as introduced by Braverman and Ostrovsky [8] to operate on a broader class of functions. In particular, we show that using only polylogarithmic space we can maintain a summary of the current window from which we can construct an O(1)approximate clustering solution. Mergeandreduce is a generic method in computational geometry for adapting offline algorithms to the insertiononly streaming model. Several wellknown coreset constructions are maintainable in the insertiononly streaming model using this method, including wellknown coreset techniques for the kmedian and kmeans problems in both lowand highdimensional Euclidean spaces [29, 13]. Previous work [25] has adapted coreset techniques to the insertiondeletion model, but translating them to the