Results 1  10
of
19
Distributed submodular maximization: Identifying representative elements in massive data
 In Neural Information Processing Systems (NIPS
, 2013
"... Abstract Many largescale machine learning problems (such as clustering, nonparametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardin ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
Abstract Many largescale machine learning problems (such as clustering, nonparametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly largescale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, twostage protocol GREEDI, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplarbased clustering, on tens of millions of data points using Hadoop.
Streaming Submodular Maximization: Massive Data Summarization on the Fly
, 2014
"... How can one summarize a massive data set “on the fly”, i.e., without even having seen it in its entirety? In this paper, we address the problem of extracting representative elements from a large stream of data. I.e., we would like to select a subset of say k data points from the stream that are most ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
How can one summarize a massive data set “on the fly”, i.e., without even having seen it in its entirety? In this paper, we address the problem of extracting representative elements from a large stream of data. I.e., we would like to select a subset of say k data points from the stream that are most representative according to some objective function. Many natural notions of “representativeness ” satisfy submodularity, an intuitive notion of diminishing returns. Thus, such problems can be reduced to maximizing a submodular set function subject to a cardinality constraint. Classical approaches to submodular maximization require full access to the data set. We develop the first efficient streaming algorithm with constant factor 1/2 − ε approximation guarantee to the optimum solution, requiring only a single pass through the data, and memory independent of data size. In our experiments, we extensively evaluate the effectiveness of our approach on several applications, including training largescale kernel methods and exemplarbased clustering, on millions of data points. We observe that our streaming method, while achieving practically the same utility value, runs about 100 times faster than previous work.
Machine teaching: an inverse problem to machine learning and an approach toward optimal education
 THE TWENTYNINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI “BLUE SKY” SENIOR MEMBER PRESENTATION TRACK)
, 2015
"... I draw the reader’s attention to machine teaching, the problem of finding an optimal training set given a machine learning algorithm and a target model. In addition to generating fascinating mathematical questions for computer scientists to ponder, machine teaching holds the promise of enhancing ed ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
I draw the reader’s attention to machine teaching, the problem of finding an optimal training set given a machine learning algorithm and a target model. In addition to generating fascinating mathematical questions for computer scientists to ponder, machine teaching holds the promise of enhancing education and personnel training. The Socratic dialogue style aims to stimulate critical thinking.
NearOptimally Teaching the Crowd to Classify
"... How should we present training examples to learners to teach them classification rules? This is a natural problem when training workers for crowdsourcing labeling tasks, and is also motivated by challenges in datadriven online education. We propose a natural stochastic model of the learners, mode ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
How should we present training examples to learners to teach them classification rules? This is a natural problem when training workers for crowdsourcing labeling tasks, and is also motivated by challenges in datadriven online education. We propose a natural stochastic model of the learners, modeling them as randomly switching among hypotheses based on observed feedback. We then develop STRICT, an efficient algorithm for selecting examples to teach to workers. Our solution greedily maximizes a submodular surrogate objective function in order to select examples to show to the learners. We prove that our strategy is competitive with the optimal teaching policy. Moreover, for the special case of linear separators, we prove that an exponential reduction in error probability can be achieved. Our experiments on simulated workers as well as three real image annotation tasks on Amazon Mechanical Turk show the effectiveness of our teaching algorithm.
Dynamic State Estimation in Distributed Aircraft Electric Control Systems via Adaptive Submodularity
"... Abstract — We consider the problem of estimating the discrete state of an aircraft electric system under a distributed control architecture through active sensing. The main idea is to use a set of controllable switches to reconfigure the system in order to gather more information about the unknown s ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract — We consider the problem of estimating the discrete state of an aircraft electric system under a distributed control architecture through active sensing. The main idea is to use a set of controllable switches to reconfigure the system in order to gather more information about the unknown state. By adaptively making a sequence of reconfiguration decisions with uncertain outcome, then correlating measurements and prior information to make the next decision, we aim to reduce the uncertainty. A greedy strategy is developed that maximizes the onestep expected uncertainty reduction. By exploiting recent results on adaptive submodularity, we give theoretical guarantees on the worstcase performance of the greedy strategy. We apply the proposed method in a fault detection scenario where the discrete state captures possible faults in various circuit components. In addition, simple abstraction rules are proposed to alleviate state space explosion and to scale up the strategy. Finally, the efficiency of the proposed method is demonstrated empirically on different circuits. I.
NearOptimal Active Learning of MultiOutput Gaussian Processes
"... This paperaddresses the problem of active learning of a multioutput Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling loca ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paperaddresses the problem of active learning of a multioutput Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling locations to be observed but also the types of measurements at each selected location for minimizing the predictive uncertainty (i.e., posterior joint entropy) of a target phenomenon of interest given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling locations and selected observations when optimized. To resolve this issue, we first exploit a structure common to sparse MOGP models for deriving a novel active learning criterion. Then, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomialtime approximation algorithm that guarantees a constantfactor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on realworld datasets shows that our proposed approach outperforms existing algorithms for active learning of MOGP and singleoutput GP models. 1
Distributed Submodular Cover: Succinctly Summarizing Massive Data
"... How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. In this paper, we formalize this challenge as a submodular cover pro ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. In this paper, we formalize this challenge as a submodular cover problem. Here, the utility is assumed to exhibit submodularity, a natural diminishing returns condition prevalent in many data summarization applications. The classical greedy algorithm is known to provide solutions with logarithmic approximation guarantees compared to the optimum solution. However, this sequential, centralized approach is impractical for truly largescale problems. In this work, we develop the first distributed algorithm – DISCOVER – for submodular set cover that is easily implementable using MapReducestyle computations. We theoretically analyze our approach, and present approximation guarantees for the solutions returned by DISCOVER. We also study a natural tradeoff between the communication cost and the number of rounds required to obtain such a solution. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including active set selection, exemplar based clustering, and vertex cover on tens of millions of data points using Spark. 1
M.: Efficient feature group sequencing for anytime linear prediction. arXiv:1409.5495
, 2014
"... We propose a regularized linear learning algorithm to sequence groups of features, where each group incurs testtime cost or computation. Specifically, we develop a simple extension to Orthogonal Matching Pursuit (OMP) that respects the structure of groups of features with variable costs, and we pr ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We propose a regularized linear learning algorithm to sequence groups of features, where each group incurs testtime cost or computation. Specifically, we develop a simple extension to Orthogonal Matching Pursuit (OMP) that respects the structure of groups of features with variable costs, and we prove that it achieves nearoptimal anytime linear prediction at each budget threshold where a new group is selected. Our algorithm and analysis extends to generalized linear models with multidimensional responses. We demonstrate the scalability of the resulting approach on large realworld datasets with many feature groups associated with testtime computational costs. Our method improves over Group Lasso and Group OMP in the anytime performance of linear predictions, measured in timeliness[7], an anytime prediction performance metric, while providing rigorous performance guarantees. 1
Exploiting Submodular Value Functions for Faster Dynamic Sensor Selection
"... A key challenge in the design of multisensor systems is the efficient allocation of scarce resources such as bandwidth, CPU cycles, and energy, leading to the dynamic sensor selection problem in which a subset of the available sensors must be selected at each timestep. While partially observable ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
A key challenge in the design of multisensor systems is the efficient allocation of scarce resources such as bandwidth, CPU cycles, and energy, leading to the dynamic sensor selection problem in which a subset of the available sensors must be selected at each timestep. While partially observable Markov decision processes (POMDPs) provide a natural decisiontheoretic model for this problem, the computational cost of POMDP planning grows exponentially in the number of sensors, making it feasible only for small problems. We propose a new POMDP planning method that uses greedy maximization to greatly improve scalability in the number of sensors. We show that, under certain conditions, the value function of a dynamic sensor selection POMDP is submodular and use this result to bound the error introduced by performing greedy maximization. Experimental results on a realworld dataset from a multicamera tracking system in a shopping mall show it achieves similar performance to existing methods but incurs only a fraction of the computational cost, leading to much better scalability in the number of cameras.
Parallel task routing for crowdsourcing
 In HCOMP
, 2014
"... An ideal crowdsourcing or citizenscience system would route tasks to the most appropriate workers, but the best assignment is unclear because workers have varying skill, tasks have varying difficulty, and assigning several workers to a single task may significantly improve output quality. This pap ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
An ideal crowdsourcing or citizenscience system would route tasks to the most appropriate workers, but the best assignment is unclear because workers have varying skill, tasks have varying difficulty, and assigning several workers to a single task may significantly improve output quality. This paper defines a space of task routing problems, proves that even the simplest is NPhard, and develops several approximation algorithms for parallel routing problems. We show that an intuitive class of requesters ’ utility functions is submodular, which lets us provide iterative methods for dynamically allocating batches of tasks that make nearoptimal use of available workers in each round. Experiments with live oDesk workers show that our task routing algorithm uses only 48% of the human labor compared to the commonly used roundrobin strategy. Further, we provide versions of our task routing algorithm which enable it to scale to large numbers of workers and questions and to handle workers with variable response times while still providing significant benefit over common baselines.