#### DMCA

## Learning adaptive value of information for structured prediction (2013)

Venue: | In NIPS |

Citations: | 3 - 0 self |

### Citations

3482 | Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ... intractable even for simple graphical models like Naive Bayes [10]. Moreover, joint models of input and output are typically quite inferior in accuracy to discriminative models of output given input =-=[11, 4, 20, 1]-=-. 1 • Richly parametrized, conditional value function. The central component of our method is an approximate value function that utilizes a set of meta-features to estimate future changes in value of ... |

660 | Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms.
- Collins
- 2002
(Show Context)
Citation Context ... intractable even for simple graphical models like Naive Bayes [10]. Moreover, joint models of input and output are typically quite inferior in accuracy to discriminative models of output given input =-=[11, 4, 20, 1]-=-. 1 • Richly parametrized, conditional value function. The central component of our method is an approximate value function that utilizes a set of meta-features to estimate future changes in value of ... |

603 | Max-Margin Markov Networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ... intractable even for simple graphical models like Naive Bayes [10]. Moreover, joint models of input and output are typically quite inferior in accuracy to discriminative models of output given input =-=[11, 4, 20, 1]-=-. 1 • Richly parametrized, conditional value function. The central component of our method is an approximate value function that utilizes a set of meta-features to estimate future changes in value of ... |

542 | Pegasos: Primal estimated sub-gradient solver for svm
- Shalev-Shwartz, Singer, et al.
- 2007
(Show Context)
Citation Context ...nctions and is also convex. Our choice of distribution for z will determine how the predictor h is calibrated. In our experiments, we sampled z’s uniformly at random. To learn w, we use Pegasos-style =-=[18]-=- stochastic sub-gradient descent; we approximate the expectation in (10) by resampling z every time we pick up a new example (xj ,yj). We set λ and a stopping-time criterion through cross-validation o... |

461 | Least-squares policy iteration
- Lagoudakis, Parr
(Show Context)
Citation Context ... Learning an approximate policy with long-range meta-features. In this work, we focus on a straightforward method for learning an approximate policy: a batch version of least-squares policy iteration =-=[12]-=- based on Q-learning [22]. We parametrize the policy using a linear function of metafeatures φ computed from the current state s = (x, z): piβ(s) = argmaxa β >φ(x, z, a). The metafeatures (which we ab... |

246 | Hidden markov support vector machines
- Altun, Tsochantaridis, et al.
- 2003
(Show Context)
Citation Context |

231 | Learning structured prediction models: a large margin approach.
- Taskar, Chatalbashev, et al.
- 2005
(Show Context)
Citation Context ...x ∈ X to outputs y ∈ Y(x), where |x| = L and y is a L-vector of K-valued variables, i.e. Y(x) = Y1×· · ·×Y` and each Yi = {1, . . . ,K}. We follow the standard max-margin structured learning approach =-=[19]-=- and consider linear predictive models of the form w>f(x,y). However, we introduce an additional explicit feature extraction state vector z: h(x, z) = argmax y∈Y(x) w>f(x,y, z). (1) Above, f(x,y, z) i... |

217 |
On a measure of the information provided by an experiment,”
- Lindley
- 1956
(Show Context)
Citation Context ...bilistic model of the input and output variables and a utility function measuring payoffs. The expected value of information measures the increase in expected utility after observing a given variable =-=[13, 8]-=-. Unfortunately, the problem of computing optimal conditional observation plans is computationally intractable even for simple graphical models like Naive Bayes [10]. Moreover, joint models of input a... |

183 |
Information Value Theory
- Howard
- 1966
(Show Context)
Citation Context ...bilistic model of the input and output variables and a utility function measuring payoffs. The expected value of information measures the increase in expected utility after observing a given variable =-=[13, 8]-=-. Unfortunately, the problem of computing optimal conditional observation plans is computationally intractable even for simple graphical models like Naive Bayes [10]. Moreover, joint models of input a... |

125 |
Beyond pixels: Exploring new representations and applications for motion analysis,”
- Liu
- 2009
(Show Context)
Citation Context ...s the predictions using a linear-chain structured sequential prediction model. There are four types of features used by MODEC+S, the final and most expensive of which is a coarse-to-fine optical flow =-=[14]-=-; we incrementally compute poses and features to minimize the total runtime. For more details on the dataset/features, see [?]. We present cross validation results averaged over 40 80/20 train/test sp... |

44 | Parsing human motion with stretchable models
- Sapp, Weiss, et al.
(Show Context)
Citation Context ...ideo, optical flow is a very informative feature that often requires many seconds of computation time per frame, whereas inference for an entire sequence typically requires only fractions of a second =-=[17]-=-; in natural language parsing, feature computation may take up to 80% of the time [?]. In this work, we show that large gains in the speed/accuracy trade-off can be obtained by departing from the trad... |

43 | Structured prediction cascades.
- Weiss, Taskar
- 2010
(Show Context)
Citation Context ... structured prediction nor the batch setting. Most work that addresses learning the accuracy/efficiency trade-off in a structured setting applies primarily to inference, not feature extraction. E.g., =-=[23]-=- extend the idea of a classifier cascade to the structured prediction setting, with the objective defined in terms of obtaining accurate inference in models with large state spaces after coarse-to-fin... |

28 | Optimal value of information in graphical models.
- Krause, Guestrin
- 2009
(Show Context)
Citation Context ...fter observing a given variable [13, 8]. Unfortunately, the problem of computing optimal conditional observation plans is computationally intractable even for simple graphical models like Naive Bayes =-=[10]-=-. Moreover, joint models of input and output are typically quite inferior in accuracy to discriminative models of output given input [11, 4, 20, 1]. 1 • Richly parametrized, conditional value function... |

26 | Active classification based on value of classifier.
- Gao, Koller
- 2011
(Show Context)
Citation Context ... models, or [7], who define a feature acquisition model similar in spirit to ours, but with a different reward function and modeling a variable trade-off rather than a fixed budget. We also note that =-=[5]-=- propose explicitly modeling the value of evaluating a classifier, but their approach uses ensembles of pre-trained models (rather than the adaptive model we propose). And while the goals of these wor... |

22 | Modec: Multimodal decomposable models for human pose estimation.
- Sapp, Taskar
- 2013
(Show Context)
Citation Context ..., our goal is to predict the joint locations of human limbs in video clips extracted from Hollywood movies. Our testbed is the MODEC+S model proposed in [?]; the MODEC+S model uses the MODEC model of =-=[16]-=- to generate 32 proposed poses per frame of a video sequence, and then combines the predictions using a linear-chain structured sequential prediction model. There are four types of features used by MO... |

19 |
Classifier cascade for minimizing feature evaluation cost
- Chen, Xu, et al.
- 2012
(Show Context)
Citation Context ...iers that use different subsets of features at different stages of processing. More recently, feature computation cost has been explicitly incorporated specifically into the learning procedure (e.g., =-=[7, 15, 3, 6]-=-.) The most related recent work of this type is [21], who define a reward function for multi-class classification with a series of increasingly complex models, or [7], who define a feature acquisition... |

19 | Speedboost: Anytime prediction with uniform nearoptimality
- Grubb, Bagnell
- 2012
(Show Context)
Citation Context ...iers that use different subsets of features at different stages of processing. More recently, feature computation cost has been explicitly incorporated specifically into the learning procedure (e.g., =-=[7, 15, 3, 6]-=-.) The most related recent work of this type is [21], who define a reward function for multi-class classification with a series of increasingly complex models, or [7], who define a feature acquisition... |

18 | Designing efficient cascaded classifiers: tradeoff between accuracy and cost
- Raykar, Krishnapuram, et al.
- 2010
(Show Context)
Citation Context ...iers that use different subsets of features at different stages of processing. More recently, feature computation cost has been explicitly incorporated specifically into the learning procedure (e.g., =-=[7, 15, 3, 6]-=-.) The most related recent work of this type is [21], who define a reward function for multi-class classification with a series of increasingly complex models, or [7], who define a feature acquisition... |

12 | Imitation learning by coaching.
- He, Daume, et al.
- 2012
(Show Context)
Citation Context |

11 | Dynamic feature selection for dependency parsing.
- He, Daume, et al.
- 2013
(Show Context)
Citation Context ...ime per frame, whereas inference for an entire sequence typically requires only fractions of a second [16]; in natural language parsing, feature computation may take up to 80% of the computation time =-=[7]-=-. In this work, we show that large gains in the speed/accuracy trade-off can be obtained by departing from the traditional method of “one-size-fits-all” model and feature selection, in which a static ... |

10 | Supervised sequential classification under budget constraints.
- Trapeznikov, Saligrama
- 2013
(Show Context)
Citation Context ...es of processing. More recently, feature computation cost has been explicitly incorporated specifically into the learning procedure (e.g., [7, 15, 3, 6].) The most related recent work of this type is =-=[21]-=-, who define a reward function for multi-class classification with a series of increasingly complex models, or [7], who define a feature acquisition model similar in spirit to ours, but with a differe... |

6 | Learned prioritization for trading off accuracy and speed. - Jiang, Teichert, et al. - 2012 |

3 | Dynamic structured model selection
- Weiss, Sapp, et al.
- 2013
(Show Context)
Citation Context ... of the predictive model in order to make their pruning decisions, and do not allow future feature computations to rectify past mistakes, as in the case of our work. Most related is the prior work of =-=[22]-=-, in which one of an ensemble of structured models is selected on a per-example basis. This idea is essentially a coarse sub-case of the framework presented in this work, without the adaptive predicti... |

1 | Dynamic structured model selection - Anonymous - 2013 |