Results 1  10
of
18
Nearoptimal sensor placements in gaussian processes
 In ICML
, 2005
"... When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in t ..."
Abstract

Cited by 195 (27 self)
 Add to MetaCart
When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in the GP model, and A, D, or Eoptimal design. In this paper, we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected. We prove that the problem of finding the configuration that maximizes mutual information is NPcomplete. To address this issue, we describe a polynomialtime approximation that is within (1 − 1/e) of the optimum by exploiting the submodularity of mutual information. We also show how submodularity can be used to obtain online bounds, and design branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations and local structure in the GP, yielding significant speedups. We also extend our approach to find placements which are robust against node failures and uncertainties in the model. These extensions are again associated with rigorous theoretical approximation guarantees, exploiting the submodularity of the objective function. We demonstrate the advantages of our approach towards optimizing mutual information in a very extensive empirical study on two realworld data sets.
Nearoptimal nonmyopic value of information in graphical models
 In Annual Conference on Uncertainty in Artificial Intelligence
"... A fundamental issue in realworld systems, such as sensor networks, is the selection of observations which most effectively reduce uncertainty. More specifically, we address the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model. We present ..."
Abstract

Cited by 96 (18 self)
 Add to MetaCart
A fundamental issue in realworld systems, such as sensor networks, is the selection of observations which most effectively reduce uncertainty. More specifically, we address the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model. We present the first efficient randomized algorithm providing a constant factor (1 − 1/e − ε) approximation guarantee for any ε> 0 with high confidence. The algorithm leverages the theory of submodular functions, in combination with a polynomial bound on sample complexity. We furthermore prove that no polynomial time algorithm can provide a constant factor approximation better than (1 − 1/e) unless P = NP. Finally, we provide extensive evidence of the effectiveness of our method on two complex realworld datasets. 1
Bayesian Treed Gaussian Process Models with an Application to Computer Modeling
 Journal of the American Statistical Association
, 2007
"... This paper explores nonparametric and semiparametric nonstationary modeling methodologies that couple stationary Gaussian processes and (limiting) linear models with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. Mixing between full Gaussian proce ..."
Abstract

Cited by 49 (15 self)
 Add to MetaCart
This paper explores nonparametric and semiparametric nonstationary modeling methodologies that couple stationary Gaussian processes and (limiting) linear models with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. Mixing between full Gaussian processes and simple linear models can yield a more parsimonious spatial model while significantly reducing computational effort. The methodological developments and statistical computing details which make this approach efficient are described in detail. Illustrations of our model are given for both synthetic and real datasets. Key words: recursive partitioning, nonstationary spatial model, nonparametric regression, Bayesian model averaging 1
Active Learning For Identifying Function Threshold Boundaries
"... We present an efficient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the function is above and below a given threshold. We develop experiment selection methods based on entropy, misclassification rates, variance, and their combi ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We present an efficient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the function is above and below a given threshold. We develop experiment selection methods based on entropy, misclassification rates, variance, and their combinations, and show how they perform on a number of data sets. We then show how these algorithms are used to determine simultaneously valid 1  # confidence intervals for seven cosmological parameters. Experimentation shows that the algorithm reduces the computation necessary for the parameter estimation problem by an order of magnitude.
Process Driven Spatial and Network Aggregation for Pandemic Response
 In Proc. SIAM DM 2006 Workshop on Spatial Data Mining
, 2006
"... Phase transitions in measures of cluster connectedness may be used to identify critical points in the propagation of an epidemic. These critical points reflect order of magnitude shifts in network properties and thus define appropriate regions for aggregation in the evolving sociotemporal portrait. ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Phase transitions in measures of cluster connectedness may be used to identify critical points in the propagation of an epidemic. These critical points reflect order of magnitude shifts in network properties and thus define appropriate regions for aggregation in the evolving sociotemporal portrait. Analysis of pre and post transitional images at these critical points can define principle corridors of propagation and establish the appropriate local scale (aggregate level) for resource allocation strategies. Semisupervised learning techniques based on Gaussian random fields enable prediction of infectious spread to unlabeled entities, and projections of disease propagation can inform allocation strategies for intelligent targeting of response resources to the most vulnerable locations in the unlabeled network. 1
Actively Learning LevelSets of Composite Functions
"... Scientists frequently have multiple types of experiments and data sets on which they can test the validity of their parameterized models and locate plausible regions for the model parameters. By examining multiple data sets, scientists can obtain inferences which typically are much more informative ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Scientists frequently have multiple types of experiments and data sets on which they can test the validity of their parameterized models and locate plausible regions for the model parameters. By examining multiple data sets, scientists can obtain inferences which typically are much more informative than the deductions derived from each of the data sources independently. Several standard data combination techniques result in target functions which are a weighted sum of the observed data sources. Thus, computing constraints on the plausible regions of the model parameter space can be formulated as finding a level set of a target function which is the sum of observable functions. We propose an active learning algorithm for this problem which selects both a a sample (from the parameter space) and an observable function upon which to compute the next sample. Empirical tests on synthetic functions and on real data for an eight parameter cosmological model show that our algorithm significantly reduces the number of samples required to identify the desired levelset. 1.
Adaptive design of supercomputer experiments
, 2006
"... Computer experiments are often performed to allow modeling of a response surface of a physical experiment that can be too costly or difficult to run except using a simulator. Running the experiment over a dense grid can be prohibitively expensive, yet running over a sparse design chosen in advance ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Computer experiments are often performed to allow modeling of a response surface of a physical experiment that can be too costly or difficult to run except using a simulator. Running the experiment over a dense grid can be prohibitively expensive, yet running over a sparse design chosen in advance can result in obtaining insufficient information in parts of the space, particularly when the surface is nonstationary. We propose an approach which automatically explores the space while simultaneously fitting the response surface, using predictive uncertainty to guide subsequent experimental runs. The newly developed Bayesian treed Gaussian process is used as the surrogate model, and a fully Bayesian approach allows explicit nonstationary measures of uncertainty. Our adaptive sequential design framework has been developed to cope with an asynchronous, random, agentbased supercomputing environment. We take a hybrid approach which melds optimal strategies from the statistics literature with flexible strategies from the active learning literature. The merits of this approach are borne out in several examples, including the motivating example of a computational fluid dynamics simulation of rocket booster. Key words: nonstationary spatial model, treed partitioning, sequential design, active learning 1
Active Learning for Interactive Visualization
"... Many automatic visualization methods have been proposed. However, a visualization that is automatically generated might be different to how a user wants to arrange the objects in visualization space. By allowing users to relocate objects in the embedding space of the visualization, they can adjust t ..."
Abstract
 Add to MetaCart
Many automatic visualization methods have been proposed. However, a visualization that is automatically generated might be different to how a user wants to arrange the objects in visualization space. By allowing users to relocate objects in the embedding space of the visualization, they can adjust the visualization to their preference. We propose an active learning framework for interactive visualization which selects objects for the user to relocate so that they can obtain their desired visualization by relocating as few as possible. The framework is based on an information theoretic criterion, which favors objects that reduce the uncertainty of the visualization. We present a concrete application of the proposed framework to the Laplacian eigenmap visualization method. We demonstrate experimentally that the proposed framework yields the desired visualization with fewer user interactions than existing methods. 1
1 NearOptimal Sensor Placement for Linear Inverse Problems
"... Abstract—A classic problem is the estimation of a set of parameters from measurements collected by few sensors. The number of sensors is often limited by physical or economical constraints and their placement is of fundamental importance to obtain accurate estimates. Unfortunately, the selection of ..."
Abstract
 Add to MetaCart
Abstract—A classic problem is the estimation of a set of parameters from measurements collected by few sensors. The number of sensors is often limited by physical or economical constraints and their placement is of fundamental importance to obtain accurate estimates. Unfortunately, the selection of the optimal sensor locations is intrinsically combinatorial and the available approximation algorithms are not guaranteed to generate good solutions in all cases of interest. We propose FrameSense, a greedy algorithm for the selection of optimal sensor locations. The core cost function of the algorithm is the frame potential, a scalar property of matrices that measures the orthogonality of its rows. Notably, FrameSense is the first algorithm that is nearoptimal in terms of mean square error, meaning that its solution is always guaranteed to be close to the optimal one. Moreover, we show with an extensive set of numerical experiments that FrameSense achieves the stateoftheart performance while having the lowest computational cost, when compared to other greedy methods. Index Terms—Sensor placement, inverse problem, frame potential, greedy algorithm. I.
CMUML07121 Actively Learning LevelSets of Composite Functions
, 2007
"... Scientists frequently have multiple types of experiments and data sets on which they can test the validity of their models and the plausible or optimal regions for the model parameters. Identifying these parameter regions reduces to finding a level set on a function defined as a composite of the eva ..."
Abstract
 Add to MetaCart
Scientists frequently have multiple types of experiments and data sets on which they can test the validity of their models and the plausible or optimal regions for the model parameters. Identifying these parameter regions reduces to finding a level set on a function defined as a composite of the evaluations of each experiment or data set for a parameter setting. An active learning algorithm for this problem must at each iteration select a parameter setting to be tested and decide which experiment type to use for the test. We propose an active learning algorithm for identifying level sets of composite functions. Empirical tests on synthetic functions and on real data for a 7D cosmological model show it significantly reduces the number of samples required to identify desired regions. 1