Results 1 - 10
of
39
Toward an architecture for never-ending language learning
- In AAAI
, 2010
"... We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74 % after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent.
iCoseg: Interactive co-segmentation with intelligent scribble guidance
- In CVPR
, 2010
"... borders); (b) shows cutouts using these scribbles. A naïve interactive co-segmentation setup would force a user to examine all cutouts for mistakes, and then iteratively scribble on the worst segmentation to obtain better results. Cutouts needing correction are shown with red borders. (c) shows the ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
borders); (b) shows cutouts using these scribbles. A naïve interactive co-segmentation setup would force a user to examine all cutouts for mistakes, and then iteratively scribble on the worst segmentation to obtain better results. Cutouts needing correction are shown with red borders. (c) shows the region prompted for more scribbles by iCoseg, thus avoiding exhaustive examination of all cutouts by users. This paper presents an algorithm for Interactive Cosegmentation of a foreground object from a group of related images. While previous approaches focus on unsupervised co- segmentation, we use successful ideas from the interactive object- cutout literature. We develop an algorithm that allows users to decide what foreground is, and then guide the output of the co- segmentation algorithm towards it via scribbles. Interestingly, keeping a user in the loop leads to simpler and highly parallelizable energy functions, allowing us to work with significantly more images per group. However, unlike the interactive single image counterpart, a
Answering queries using humans, algorithms, and databases
- In CIDR
, 2011
"... For some problems, human assistance is needed in addition to automated (algorithmic) computation. In sharp contrast to existing data management approaches, where human input is either ad-hoc or is never used, we describe the design of the first declarative language involving human-computable functio ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
For some problems, human assistance is needed in addition to automated (algorithmic) computation. In sharp contrast to existing data management approaches, where human input is either ad-hoc or is never used, we describe the design of the first declarative language involving human-computable functions, standard relational operators, as well as algorithmic computation. We consider the challenges involved in optimizing queries posed in this language, in particular, the tradeoffs between uncertainty, cost and performance, as well as combination of human and algorithmic evidence. We believe that the vision laid out in this paper can act as a roadmap for a new area of data management research where human computation is routinely used in data analytics.
Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance ABSTRACT
"... This paper analyses alternative techniques for deploying lowcost human resources for data acquisition for classifier induction in domains exhibiting extreme class imbalance—where traditional labeling strategies, such as active learning, can be ineffective. Consider the problem of building classifier ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper analyses alternative techniques for deploying lowcost human resources for data acquisition for classifier induction in domains exhibiting extreme class imbalance—where traditional labeling strategies, such as active learning, can be ineffective. Consider the problem of building classifiers to help brands control the content adjacent to their on-line advertisements. Although frequent enough to worry advertisers, objectionable categories are rare in the distribution of impressions encountered by most on-line advertisers—so rare that traditional sampling techniques do not find enough positive examples to train effective models. An alternative way to deploy human resources for training-data acquisition is to have them “guide ” the learning by searching explicitly for training examples of each class. We show that under extreme skew, even basic techniques for guided learning completely dominate smart (active) strategies for applying human resources to select cases for labeling. Therefore, it is critical to consider the relative cost of search versus labeling, and we demonstrate the tradeoffs for different relative costs. We show that in cost/skew settings where the choice between search and active labeling is equivocal, a hybrid strategy can combine the benefits.
Mixed-Initiative Clustering
, 2010
"... Mixed-initiative clustering is a task where a user and a machine work collaboratively to analyze a large set of documents. We hypothesize that a user and a machine can both learn better clustering models through enriched communication and interactive learning from each other. The first contribution ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Mixed-initiative clustering is a task where a user and a machine work collaboratively to analyze a large set of documents. We hypothesize that a user and a machine can both learn better clustering models through enriched communication and interactive learning from each other. The first contribution of this thesis is providing a framework of mixedinitiative clustering. The framework consists of machine learning and teaching phases, and user learning and teaching phases connected in an interactive loop which allows bi-directional communication. The bi-directional communication languages define types of information exchanged in an interface. Coordination between the two communication languages and the adaptation capability of the machine’s clustering model is the key to building a mixed-initiative clustering system. The second contribution comes from successfully building several systems using our proposed framework. Two systems are built with incrementally enriched communication languages – one enables user feedback on features for
Link-based Active Learning
"... Supervised and semi-supervised data mining techniques require labeled data. However, labeling examples is costly for many real-world applications. To address this problem, active learning techniques have been developed to guide the labeling process in an effort to minimize the amount of labeled data ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Supervised and semi-supervised data mining techniques require labeled data. However, labeling examples is costly for many real-world applications. To address this problem, active learning techniques have been developed to guide the labeling process in an effort to minimize the amount of labeled data without sacrificing much from the quality of the learned models. Yet, most of the active learning methods to date have remained relatively agnostic to the rich structure offered by network data, often ignoring the relationships between the nodes of a network. On the other hand, the relational learning community has shown that the relationships can be very informative for various prediction tasks. In this paper, we propose different ways of adapting existing active learning work to network data while utilizing links to select better examples to label. 1
A.: Combining Generative and Discriminative Models for Semantic Segmentation of CT Scans via Active Learning
- In Székely, G., Hahn, H.K., eds.: Information Processing in Medical Imaging. Volume 6801 of LNCS
, 2011
"... Abstract. This paper presents a new supervised learning framework for the efficient recognition and segmentation of anatomical structures in 3D computed tomography (CT), with as little training data as possible. Training supervised classifiers to recognize organs within CT scans requires a large num ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. This paper presents a new supervised learning framework for the efficient recognition and segmentation of anatomical structures in 3D computed tomography (CT), with as little training data as possible. Training supervised classifiers to recognize organs within CT scans requires a large number of manually delineated exemplar 3D images, which are very expensive to obtain. In this study, we borrow ideas from the field of active learning to optimally select a minimum subset of such images that yields accurate anatomy segmentation. The main contribution of this work is in designing a combined generative-discriminative model which: i) drives optimal selection of training data; and ii) increases segmentation accuracy. The optimal training set is constructed by finding unlabeled scans which maximize the disagreement between our two complementary probabilistic models, as measured by a modified version of the Jensen-Shannon divergence. Our algorithm is assessed on a database of 196 labeled clinical CT scans with high variability in resolution, anatomy, pathologies, etc. Quantitative evaluation shows that, compared with randomly selecting the scans to annotate, our method decreases the number of training images by up to 45%. Moreover, our generative model of body shape substantially increases segmentation accuracy when compared to either using the discriminative model alone or a generic smoothness prior (e.g. via a Markov Random Field). 1
Human-assisted graph search: It’s okay to ask questions
- Stanford Infolab
"... We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form “Is there a target node that is reachable from the current node?”. This ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form “Is there a target node that is reachable from the current node?”. This general problem has applications in many domains that can utilize human intelligence, including curation of hierarchies, debugging workflows, image segmentation and categorization, interactive search and filter synthesis. To our knowledge, this work provides the first formal algorithmic study of the optimization of human computation for this problem. We study various dimensions of the problem space, providing algorithms and complexity results. We also compare the performance of our algorithm against other algorithms, for the problem of webpage categorization on a real taxonomy. Our framework and algorithms can be used in the design of an optimizer for crowdsourcing platforms such as Mechanical Turk. 1.
Who should label what? instance allocation in multiple expert active learning
- In Proc. of the SIAM International Conference on Data Mining (SDM
, 2011
"... The active learning (AL) framework is an increasingly popular strategy for reducing the amount of human labeling effort required to induce a predictive model. Most work in AL has assumed that a single, infallible oracle provides labels requested by the learner at a fixed cost. However, real-world ap ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The active learning (AL) framework is an increasingly popular strategy for reducing the amount of human labeling effort required to induce a predictive model. Most work in AL has assumed that a single, infallible oracle provides labels requested by the learner at a fixed cost. However, real-world applications suitable for AL often include multiple domain experts who provide labels of varying cost and quality. We explore this multiple expert active learning (MEAL) scenario and develop a novel algorithm for instance allocation that exploits the meta-cognitive abilities of novice (cheap) experts in order to make the best use of the experienced (expensive) annotators. We demonstrate that this strategy outperforms strong baseline approaches to MEAL on both a sentiment analysis dataset and two datasets from our motivating application of biomedical citation screening. Furthermore, we provide evidence that novice labelers are often aware of which instances they are likely to mislabel. 1
Inactive Learning? Difficulties Employing Active Learning in Practice
"... Despite the tremendous level of adoption of machine learning techniques in real-world settings, and the large volume of research on active learning, active learning techniques have been slow to gain substantial traction in practical applications. This reluctance of adoption is contrary to active lea ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Despite the tremendous level of adoption of machine learning techniques in real-world settings, and the large volume of research on active learning, active learning techniques have been slow to gain substantial traction in practical applications. This reluctance of adoption is contrary to active learning’s promise of reduced model-development costs and increased performance on a model-development budget. This essay presents several important and under-discussed challenges to using active learning well in practice. We hope this paper can serve as a call to arms for researchers in active learning—an encouragement to focus even more attention on how practitioners might actually use active learning. 1.

