## Near-optimal nonmyopic value of information in graphical models

### Cached

### Download Links

Venue: | In Annual Conference on Uncertainty in Artificial Intelligence |

Citations: | 90 - 17 self |

### BibTeX

@INPROCEEDINGS{Krause_near-optimalnonmyopic,

author = {Andreas Krause and Carlos Guestrin},

title = {Near-optimal nonmyopic value of information in graphical models},

booktitle = {In Annual Conference on Uncertainty in Artificial Intelligence},

year = {},

publisher = {John}

}

### Years of Citing Articles

### OpenURL

### Abstract

A fundamental issue in real-world systems, such as sensor networks, is the selection of observations which most effectively reduce uncertainty. More specifically, we address the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model. We present the first efficient randomized algorithm providing a constant factor (1 − 1/e − ε) approximation guarantee for any ε> 0 with high confidence. The algorithm leverages the theory of submodular functions, in combination with a polynomial bound on sample complexity. We furthermore prove that no polynomial time algorithm can provide a constant factor approximation better than (1 − 1/e) unless P = NP. Finally, we provide extensive evidence of the effectiveness of our method on two complex real-world datasets. 1

### Citations

8563 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...d criterion for measuring uncertainty is the entropy of a distribution P : {x1, . . . , xd} → [0, 1], H(P ) = − ∑ P (xk) log P (xk), k measuring the number of bits required to encode {x1, . . . , xd} =-=[2]-=-. If A is a set of discrete random variables A = {X1, . . . , Xn}, then their entropy H(A) is defined as the entropy of their joint distribution. The conditional entropy H(A | B) for two subsets A, B ... |

3529 | Optimization by simulated annealing
- Gelatt, Vecchi
- 1983
(Show Context)
Citation Context ... ∪ X ∗ ; W ′ := W ′ \ X ∗ ; if F (G) > F (A2) then A2 := G return argmax F (A) A∈{A1,A2} end Algorithm 2: Approximation algorithm for budgeted case. proximation algorithms such as simulated annealing =-=[12]-=-, which are in certain cases guaranteed to converge to the optimal solution with high probability, but for which one usually does not know whether they have already achieved the optimum or not. Remark... |

1491 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...es we need to guarantee that the sample mean ∆N does not differ from ∆ by more than ε/L with a confidence of at least 1 − δ Ln . First we note that 0 ≤ ∆ ≤ H(X) ≤ log |dom(X)|. Hoeffding’s inequality =-=[10]-=- states that Pr[|∆N − ∆| ≥ ε/L] ≤ ε 2 exp(−2N( δ/(Ln) if N ≥ 1 2 L log |dom(X)| )2). This quantity is bounded by L log |dom(X)| ( ε ) 2 log 2Ln δ . □ Proof of Theorem 11. Let ε, δ > 0. We will approxi... |

628 | A threshold of ln n for approximating set cover
- Feige
- 1998
(Show Context)
Citation Context ...roof of Theorem 9. We will reduce MAX-COVER to maximizing the information gain. MAX-COVER is the problem of finding a collection of L sets such their union contains the maximum number of elements. In =-=[5]-=-, it is shown that MAX-COVER cannot be approximated by a factor better than (1 − 1/e) unless P = NP. Our reduction generates a Turing machine, with a polynomial runtime guarantee, which computes infor... |

398 |
An analysis of approximations for maximizing submodular set functions—I
- Nemhauser, Wolsey, et al.
- 1978
(Show Context)
Citation Context ...ificantly worsen our approximation guarantee. Fortunately, if we can compute the marginal increases F (A∪X)−F (A) with an absolute error of at most ε, an argument very similar to the one presented in =-=[16]-=- shows that the greedy algorithm will then provide a solution Â such that F ( Â) ≥ (1 − 1/e)OP T − 2Lε with high confidence. The following Theorem summarizes our analysis of Algorithm 1, using Algorit... |

335 | Model-driven data acquisition in sensor networks
- Deshpande, Guestrin, et al.
- 2004
(Show Context)
Citation Context ...across a building as shown in Fig. 1(a). Our goal in this example is to become most certain about the temperature distribution, whilst minimizing energy expenditure, a critically constrained resource =-=[3]-=-. Unfortunately, as we show in [15], the problem of selecting the most informative subset of observations is NP PP - complete, even when the underlying random variables are discrete and their joint pr... |

174 | Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies, Journal of Machine Learning Research 9
- Krause, Singh, et al.
- 1993
(Show Context)
Citation Context ...part are most uncertain about each other. Since sensors usually provide most information about the area surrounding them, these border placements “waste” part of their sensing capacity, as noticed in =-=[8]-=-. A more direct measure of value of information is the information gain I(B; A), which is defined as U3 I(B; A) = H(B) − H(B | A), i.e., the expected reduction in uncertainty over the variables in B g... |

119 | The budgeted maximum coverage problem
- Khuller, Moss, et al.
- 1999
(Show Context)
Citation Context ... general algorithm has been developed for maximizing non-decreasing submodular functions under a budget constraint with general (additive) costs. Their result builds on an algorithm of Khuller et al. =-=[11]-=-, who investigated the budgeted MAX-COVER problem (a specific example of a submodular function). Using a partial enumeration technique, the same performance guarantee (1 − 1/e) can be provided, as in ... |

65 |
A note on maximizing a submodular set function subject to knapsack constraint
- Sviridenko
(Show Context)
Citation Context ...y practical problems, different observations have different costs. Building on recent constant-factor approximation algorithms for maximizing submodular functions where elements have different costs =-=[18, 14]-=-, we extend our approach to problems where possible observations have different costs. Finally, we provide extensive empirical validation of our method on real-world data sets, demonstrating the advan... |

59 | An approximate nonmyopic computation for value of information
- Heckerman, Horvitz, et al.
- 1993
(Show Context)
Citation Context ...ertainty about unobserved variables. Although there is a vast literature on myopic optimization for value of information (c.f., [20, 4, 1]), there has been little prior work on nonmyopic analysis. In =-=[9]-=-, a method is proposed to compute the maximum expected utility for specific sets of observations. While their work considers more general objective functions than information gain, they provide only l... |

53 |
An Exact Algorithm for Maximum Entropy Sampling
- Ko, Jon, et al.
- 1995
(Show Context)
Citation Context ...gmax H(A). (2.3) A:c(A)≤L It is no surprise that this problem has been tackled with heuristic approaches, since even the unit cost has been shown to be NP-hard for multivariate Gaussian distributions =-=[13]-=-, and a related formulation has been shown to be NP PP - hard even for discrete distributions that can be represented by polytree graphical models [15]. A major limitation of this approach is that joi... |

36 | Optimal nonmyopic value of information in graphical models‐efficient algorithms and theoretical limits
- Krause, Guestrin
- 2005
(Show Context)
Citation Context ... 1(a). Our goal in this example is to become most certain about the temperature distribution, whilst minimizing energy expenditure, a critically constrained resource [3]. Unfortunately, as we show in =-=[15]-=-, the problem of selecting the most informative subset of observations is NP PP - complete, even when the underlying random variables are discrete and their joint probability distribution can be repre... |

29 | Myopic value of information in influence diagrams
- Dittmer, Jensen
- 1997
(Show Context)
Citation Context ...nt probability distribution can be represented as a polytree graphical model (even though inference is efficient in these models). To address this complexity issue, it has been common practice (c.f., =-=[20, 4, 1, 17]-=-) to myopically (greedily) select the most uncertain variable as the next observation, or, equivalently, the set of observations with maximum joint entropy. Unfortunately, these greedy approaches are ... |

26 | A Note on the Budgeted Maximization of Submodular Functions
- Krause, Guestrin
- 2005
(Show Context)
Citation Context ...y practical problems, different observations have different costs. Building on recent constant-factor approximation algorithms for maximizing submodular functions where elements have different costs =-=[18, 14]-=-, we extend our approach to problems where possible observations have different costs. Finally, we provide extensive empirical validation of our method on real-world data sets, demonstrating the advan... |

24 |
Polymatroidal dependence structure of a set of random variables
- Fujishige
- 1978
(Show Context)
Citation Context ...A) = H(X | A), submodularity is simply a consequence of the information never hurts principle: F (A∪X)−F (A)=H(X|A)≥H(X|A ′ )=F (A ′ ∪X)−F (A ′ ). Submodularity of entropy has been established before =-=[6]-=-. Contrary to the differential entropy, which can be negative, in the discrete case, the entropy H is guaranteed to be nondecreasing, i.e., F (A ∪ X) − F (A) = H(X | A) ≥ 0 for all sets A ⊆ V. Further... |

22 | Selective evidence gathering for diagnostic belief networks
- Gaag, Wessels
- 1993
(Show Context)
Citation Context ...nt probability distribution can be represented as a polytree graphical model (even though inference is efficient in these models). To address this complexity issue, it has been common practice (c.f., =-=[20, 4, 1, 17]-=-) to myopically (greedily) select the most uncertain variable as the next observation, or, equivalently, the set of observations with maximum joint entropy. Unfortunately, these greedy approaches are ... |

18 | Gaussian processes for active data mining of spatial aggregates
- Ramakrishnan, Bailey-Kellogg, et al.
- 2005
(Show Context)
Citation Context ...nt probability distribution can be represented as a polytree graphical model (even though inference is efficient in these models). To address this complexity issue, it has been common practice (c.f., =-=[20, 4, 1, 17]-=-) to myopically (greedily) select the most uncertain variable as the next observation, or, equivalently, the set of observations with maximum joint entropy. Unfortunately, these greedy approaches are ... |

15 |
Learning diagnostic policies from examples by systematic search
- Bayer‐Zubek
(Show Context)
Citation Context |

1 |
The maximization of submodular functions
- Goldengorin, Tijssen, et al.
- 1999
(Show Context)
Citation Context ...lar functions is NP-hard, by reduction from the max-cover problem, for example. There are branch-and-bound algorithms for maximizing submodular functions, such as the dichotomy algorithm described in =-=[7]-=-, but they do not provide guarantees in terms of required running time. Typical problem sizes in practical applications are too large for exact algorithms, necessitating the development of approximati... |

1 |
PATH and Caltrans. Freeway performance measurement system. http://pems.eecs.berkeley.edu
- Berkeley
(Show Context)
Citation Context ...at the bounds presented in Theorem 10 are very loose for practical applications. 6.2 Highway traffic data In our second set of experiments, we analyzed highway traffic from the San Francisco Bay area =-=[19]-=-. Each detector station, 77 in total, computed aggregated measurements over 5 minutes, reporting the total number of vehicle miles traveled divided by the total vehicle hours for its region. As for th... |