Results 1 - 10
of
15
Robust submodular observation selection
, 2008
"... In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations wh ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective functions. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NP-complete problems admit efficient algorithms. We show how our algorithm can be extended to handle complex cost functions (incorporating non-unit observation cost or communication and path costs). We also show how the algorithm can be used to near-optimally trade off expected-case (e.g., the Mean Square Prediction Error in Gaussian Process regression) and worst-case (e.g., maximum predictive variance) performance. We show that many important machine learning problems fit our robust submodular observation selection formalism, and provide extensive empirical evaluation on several real-world problems. For Gaussian Process regression, our algorithm compares favorably with state-of-the-art heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDP-based algorithms.
Selecting Observations against Adversarial Objectives
, 2007
"... In many applications, one has to actively select among a set of expensive observations before making an informed decision. Often, we want to select observations which perform well when evaluated with an objective function chosen by an adversary. Examples include minimizing the maximum posterior vari ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
In many applications, one has to actively select among a set of expensive observations before making an informed decision. Often, we want to select observations which perform well when evaluated with an objective function chosen by an adversary. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for the case where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NP-complete problems admit efficient algorithms. We evaluate our algorithm on several real-world problems. For Gaussian Process regression, our algorithm compares favorably with state-of-the-art heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDP-based algorithms.
Submodular meets Spectral: Greedy Algorithms for Sparse Approximation and
- Dictonary Selection, 2011. http://arxiv.org/abs/1102.3975. Diekhoff, G. Statistics for the Social and Behavioral Sciences
"... We study the problem of selecting a subset of k random variables from a large set, in order to obtain the best linear prediction of another variable of interest. This problem can be viewed in the context of both feature selection and sparse approximation. We analyze the performance of widely used gr ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We study the problem of selecting a subset of k random variables from a large set, in order to obtain the best linear prediction of another variable of interest. This problem can be viewed in the context of both feature selection and sparse approximation. We analyze the performance of widely used greedy heuristics, using insights from the maximization of submodular functions and spectral analysis. We introduce the submodularity ratio as a key quantity to help understand why greedy algorithms perform well even when the variables are highly correlated. Using our techniques, we obtain the strongest known approximation guarantees for this problem, both in terms of the submodularity ratio and the smallest k-sparse eigenvalue of the covariance matrix. We also analyze greedy algorithms for the dictionary selection problem, and significantly improve the previously known guarantees. Our theoretical analysis is complemented by experiments on real-world and synthetic data sets; the experiments show that the submodularity ratio is a stronger predictor of the performance of greedy algorithms than other spectral parameters. 1.
Sensor Selection for Minimizing Worst-Case Prediction Error
"... In this paper, we study the problem of choosing the ”best ” subset of k sensors to sample from among a sensor deployment of n> k sensors, in order to predict aggregate functions over all the sensor values. The sensor data being measured are assumed to be spatially correlated, in the sense that the v ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we study the problem of choosing the ”best ” subset of k sensors to sample from among a sensor deployment of n> k sensors, in order to predict aggregate functions over all the sensor values. The sensor data being measured are assumed to be spatially correlated, in the sense that the values at two sensors can differ by at most a monotonically increasing, concave function of their distance. The goal in our work is then to select sensors so as to minimize the error, assuming that the actual values at unsampled sensors are worst-case subject to the constraints imposed by their distances from sampled sensors. Even for the mean, maximum, and minimum, the problem is NP-hard; we present approximation algorithms to select near-optimal subsets of k sensors that minimize the worstcase prediction error. In general, we show that for any aggregate function with certain concavity, symmetry and monotonicity conditions, the sensor selection problem can be modeled as a k-median clustering problem, and solved using efficient approximation algorithms designed for kmedian clustering. Our theoretical results are complemented by experiments on two real-world sensor data sets; our experiments confirm that our algorithms lead to prediction errors that are usually less than the (normalized) standard deviation of the test data, using only around 10 % of the sensors. 1
Distributed Greedy Sensor Scheduling for Model-based Reconstruction of Space-Time Continuous Physical Phenomena
"... Abstract – A novel distributed sensor scheduling method for large-scale sensor networks observing space-time continuous physical phenomena is introduced. In a first step, the model of the distributed phenomenon is spatially and temporally decomposed leading to a linear probabilistic finite-dimension ..."
Abstract
- Add to MetaCart
Abstract – A novel distributed sensor scheduling method for large-scale sensor networks observing space-time continuous physical phenomena is introduced. In a first step, the model of the distributed phenomenon is spatially and temporally decomposed leading to a linear probabilistic finite-dimensional model. Based on this representation, the information gain of sensor measurements is evaluated by means of the so-called covariance reduction function. For this reward function, it is shown that the performance of the greedy sensor scheduling is at least half that of the optimal scheduling considering long-term effects. This finding is the key for distributed sensor scheduling, where a central processing unit or fusion center is unnecessary, and thus, scaling as well as reliability is ensured. Hence, greedy scheduling in combination with a proposed hierarchical communication scheme requires only local sensor information and communication.
Budgeted Nonparametric Learning from Data Streams
"... We consider the problem of extracting informative exemplars from a data stream. Examples of this problem include exemplarbased clustering and nonparametric inference such as Gaussian process regression on massive data sets. We show that these problems require maximization of a submodular function th ..."
Abstract
- Add to MetaCart
We consider the problem of extracting informative exemplars from a data stream. Examples of this problem include exemplarbased clustering and nonparametric inference such as Gaussian process regression on massive data sets. We show that these problems require maximization of a submodular function that captures the informativeness of a set of exemplars, over a data stream. We develop an efficient algorithm, Stream-Greedy, which is guaranteed to obtain a constant fraction of the value achieved by the optimal solution to this NP-hard optimization problem. We extensively evaluate our algorithm on large real-world data sets. 1.
In-Situ Soil Moisture Sensing: Optimal Sensor Placement and Field Estimation
"... We study the problem of optimal sensor placement in the context of soil moisture sensing. The goal of sensor placement is to select a subset of locations to collect (point) observations, so as to minimize an error measure of the resulting estimate for the unobserved locations. Prior work on sensor p ..."
Abstract
- Add to MetaCart
We study the problem of optimal sensor placement in the context of soil moisture sensing. The goal of sensor placement is to select a subset of locations to collect (point) observations, so as to minimize an error measure of the resulting estimate for the unobserved locations. Prior work on sensor placement has often relied on the assumption that the underlying spatial random process is Gaussian. We show that soil moisture in general does not follow a Gaussian distribution; rather it exhibits a multimodal behavior. On the other hand, it possesses unique features that can be exploited. Specifically, there exists a coarse-grained monotonic ordering of locations in their soil moisture level over time, a feature much more stable than the soil moisture process itself at these locations. This motivates a clustered sensor placement scheme, where locations are classified into clusters based on this ordering. Extensive numerical experiments are performed using a large set of 3-dimensional soil moisture data generated by a state-of-the-art soil moisture simulator. We conclude that the coarse-grained ordering of locations is a far more stable feature inherent in the soil moisture data, and placement algorithms using this feature outperform those solely relying on the Gaussian assumption.
Online Distributed Sensor Selection
, 2010
"... A key problem in sensor networks is to decide which sensors to query when, in order to obtain the most useful information (e.g., for performing accurate prediction), subject to constraints (e.g., on power and bandwidth). In many applications the utility function is not known a priori, must be learne ..."
Abstract
- Add to MetaCart
A key problem in sensor networks is to decide which sensors to query when, in order to obtain the most useful information (e.g., for performing accurate prediction), subject to constraints (e.g., on power and bandwidth). In many applications the utility function is not known a priori, must be learned from data, and can even change over time. Furthermore for large sensor networks solving a centralized optimization problem to select sensors is not feasible, and thus we seek a fully distributed solution. In this paper, we present Distributed Online Greedy (DOG), an efficient, distributed algorithm for repeatedly selecting sensors online, only receiving feedback about the utility of the selected sensors. We prove very strong theoretical no-regret guarantees that apply whenever the (unknown) utility function satisfies a natural diminishing returns property called submodularity. Our algorithm has extremely low communication requirements, and scales well to large sensor deployments. We extend DOG to allow observationdependent sensor selection. We empirically demonstrate the effectiveness of our algorithm on several real-world sensing tasks.
Robust Sensor Placements at Informative and Communication-Efficient Locations
"... When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this article, we present a data-driven approach that addresses the thre ..."
Abstract
- Add to MetaCart
When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this article, we present a data-driven approach that addresses the three central aspects of this problem: measuring the predictive quality of a set of sensor locations (regardless of whether sensors were ever placed at these locations), predicting the communication cost involved with these placements, and designing an algorithm with provable quality guarantees that optimizes the NP-hard trade-off. Specifically, we use data from a pilot deployment to build nonparametric probabilistic models called Gaussian Processes (GPs) both for the spatial phenomena of interest and for the spatial variability of link qualities, which allows us to estimate predictive power and communication cost of unsensed locations. Surprisingly, uncertainty in the representation of link qualities plays an important role in estimating communication costs. Using these models, we present a novel, polynomial-time, data-driven algorithm, PSPIEL, which selects Sensor Placements at Informative and communication-Efficient Locations. Our approach exploits two important properties of this problem: submodularity, formalizing the intuition that adding a node to a small deployment can help more than adding a node to a large deployment; and locality, under which nodes that are far from each other provide almost independent information. Exploiting these properties, we prove strong

