Results 1 -
7 of
7
Multi-Task Active Learning with Output Constraints
"... Many problems in information extraction, text mining, natural language processing and other fields exhibit the same property: multiple prediction tasks are related in the sense that their outputs (labels) satisfy certain constraints. In this paper, we propose an active learning framework exploiting ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Many problems in information extraction, text mining, natural language processing and other fields exhibit the same property: multiple prediction tasks are related in the sense that their outputs (labels) satisfy certain constraints. In this paper, we propose an active learning framework exploiting such relations among tasks. Intuitively, with task outputs coupled by constraints, active learning can utilize not only the uncertainty of the prediction in a single task but also the inconsistency of predictions across tasks. We formalize this idea as a crosstask value of information criteria, in which the reward of a labeling assignment is propagated and measured over all relevant tasks reachable through constraints. A specific example of our framework leads to the cross entropy measure on the predictions of coupled tasks, which generalizes the entropy in the classical singletask uncertain sampling. We conduct experiments on two real-world problems: web information extraction and document classification. Empirical results demonstrate the effectiveness of our framework in actively collecting labeled examples for multiple related tasks. 1
Active Classification based on Value of Classifier
"... Modern classification tasks usually involve many class labels and can be informed by a broad range of features. Many of these tasks are tackled by constructing a set of classifiers, which are then applied at test time and then pieced together in a fixed procedure determined in advance or at training ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Modern classification tasks usually involve many class labels and can be informed by a broad range of features. Many of these tasks are tackled by constructing a set of classifiers, which are then applied at test time and then pieced together in a fixed procedure determined in advance or at training time. We present an active classification process at the test time, where each classifier in a large ensemble is viewed as a potential observation that might inform our classification process. Observations are then selected dynamically based on previous observations, using a value-theoretic computation that balances an estimate of the expected classification gain from each observation as well as its computational cost. The expected classification gain is computed using a probabilistic model that uses the outcome from previous observations. This active classification process is applied at test time for each individual test instance, resulting in an efficient instance-specific decision path. We demonstrate the benefit of the active scheme on various real-world datasets, and show that it can achieve comparable or even higher classification accuracy at a fraction of the computational costs of traditional methods. 1
A Utility-Theoretic Approach to Privacy in Online Services
"... Online offerings such as web search, news portals, and e-commerce applications face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highlighted the potential to improve performance by introducing methods to personalize services based on specia ..."
Abstract
- Add to MetaCart
Online offerings such as web search, news portals, and e-commerce applications face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highlighted the potential to improve performance by introducing methods to personalize services based on special knowledge about users and their context. For example, a user’s demographics, location, and past search and browsing may be useful in enhancing the results offered in response to web search queries. However, reasonable concerns about privacy by both users, providers, and government agencies acting on behalf of citizens, may limit access by services to such information. We introduce and explore an economics of privacy in personalization, where people can opt to share personal information, in a standing or on-demand manner, in return for expected enhancements in the quality of an online service. We focus on the example of web search and formulate realistic objective functions for search efficacy and privacy. We demonstrate how we can find a provably near-optimal optimization of the utility-privacy tradeoff in an efficient manner. We evaluate our methodology on data drawn from a log of the search activity of volunteer participants. We separately assess users preferences about privacy and utility via a large-scale survey, aimed at eliciting preferences about peoples willingness to trade the sharing of personal data in returns for gains in search efficiency. We show that a significant level of personalization can be achieved using a relatively small amount of information about users. 1.
Sparse Measurement Systems: Applications, Analysis, Algorithms and Design
"... This thesis deals with ‘large-scale ’ detection problems that arise in many real world applications such as sensor networks, mapping with mobile robots and group testing for biological screening and drug discovery. These are problems where the values of a large number of inputs need to be inferred f ..."
Abstract
- Add to MetaCart
This thesis deals with ‘large-scale ’ detection problems that arise in many real world applications such as sensor networks, mapping with mobile robots and group testing for biological screening and drug discovery. These are problems where the values of a large number of inputs need to be inferred from noisy observations and where the transformation from input to measurement occurs because of a physical process. In particular, we focus on sparse measurement systems. We use the term sparse measurement system to refer to applications where every observation is a (stochastic) function of a small number of inputs. Here, small is relative to the total input size. Such a system can conveniently be represented by a (sparse) structured graphical model. We study the fundamental limits of performance of these sparse measurement systems through an information theoretic lens and analyze robustness to noise, model mismatch and parameter uncertainty. We also look at these problems from an algorithmic point of view and develop practical algorithms, aided by the representation of the system as a graphical model. We analyze how the computational cost of detection with sparse measurements changes with
Robust Sensor Placements at Informative and Communication-Efficient Locations
"... When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this article, we present a data-driven approach that addresses the thre ..."
Abstract
- Add to MetaCart
When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this article, we present a data-driven approach that addresses the three central aspects of this problem: measuring the predictive quality of a set of sensor locations (regardless of whether sensors were ever placed at these locations), predicting the communication cost involved with these placements, and designing an algorithm with provable quality guarantees that optimizes the NP-hard trade-off. Specifically, we use data from a pilot deployment to build nonparametric probabilistic models called Gaussian Processes (GPs) both for the spatial phenomena of interest and for the spatial variability of link qualities, which allows us to estimate predictive power and communication cost of unsensed locations. Surprisingly, uncertainty in the representation of link qualities plays an important role in estimating communication costs. Using these models, we present a novel, polynomial-time, data-driven algorithm, PSPIEL, which selects Sensor Placements at Informative and communication-Efficient Locations. Our approach exploits two important properties of this problem: submodularity, formalizing the intuition that adding a node to a small deployment can help more than adding a node to a large deployment; and locality, under which nodes that are far from each other provide almost independent information. Exploiting these properties, we prove strong
Near-Optimal Bayesian . . .
"... We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise–free observations, a greedy algorithm called generali ..."
Abstract
- Add to MetaCart
We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise–free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near–optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC 2, a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non–uniform cost and their noise is correlated. We also propose EFFECX-TIVE, a particularly fast approximation of EC 2, and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty.
clustering with applications to petroleum exploration
"... Dynamic exploration designs for graphical models using clustering with applications to petroleum exploration by ..."
Abstract
- Add to MetaCart
Dynamic exploration designs for graphical models using clustering with applications to petroleum exploration by

