Results 1 - 10
of
34
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
Extreme video retrieval: joint maximization of human and computer performance
- In ACM Multimedia
, 2006
"... We present an efficient system for video search that maximizes the use of human bandwidth, while at the same time exploiting the machine’s ability to learn in real-time from user selected relevant video clips. The system exploits the human capability for rapidly scanning imagery augmenting it with a ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
We present an efficient system for video search that maximizes the use of human bandwidth, while at the same time exploiting the machine’s ability to learn in real-time from user selected relevant video clips. The system exploits the human capability for rapidly scanning imagery augmenting it with an active learning loop, which attempts to always present the most relevant material based on the current information. Two versions of the human interface were evaluated, one with variable page sizes and manual paging, the other with a fixed page size and automatic paging. Both require absolute attention and focus of the user for optimal performance. In either case, as humans search and find relevant results, the system can invisibly re-rank its previous best guesses using a number of knowledge sources, such as image similarity, text similarity, and temporal proximity. Experimental evidence shows a significant improvement using the combined extremes of human and machine power over either approach alone.
Active Learning to Recognize Multiple Types of Plankton
- Journal of Machine Learning Research
, 2004
"... Active learning has been applied with support vector machines to reduce the data labeling effort in pattern recognition domains. However, most of those applications only deal with two class problems. In this paper, we extend the active learning approach to multiple class support vector machines. The ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Active learning has been applied with support vector machines to reduce the data labeling effort in pattern recognition domains. However, most of those applications only deal with two class problems. In this paper, we extend the active learning approach to multiple class support vector machines. The experimental results from a plankton recognition system indicate that our approach often requires significantly less labeled images to maintain the same accuracy level as random sampling. 1.
Mean version space: a new active learning method for content-based image retrieval
- Proc. Workshop on Multimedia Information Retrieval, in conjunction with ACM Multimedia
, 2004
"... In content-based image retrieval, relevance feedback has been introduced to narrow the gap between low-level image feature and high-level semantic concept. Furthermore, to speed up the convergence to the query concept, several active learning methods have been proposed instead of random sampling to ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In content-based image retrieval, relevance feedback has been introduced to narrow the gap between low-level image feature and high-level semantic concept. Furthermore, to speed up the convergence to the query concept, several active learning methods have been proposed instead of random sampling to select images for labeling by the user. In this paper, we propose a novel active learning method named mean version space, aiming to select the optimal image in each round of relevance feedback. Firstly, by diving into the lemma that motivates support vector machine active learning method (SVM active), we come up with a new criterion which is tailored for each specific learning task and will lead to the fastest shrinkage of the version space in all cases. The criterion takes both the size of the version space and the posterior probabilities into consideration, while existing methods are only based on one of them. Moreover, although our criterion is designed for SVM, it can be justified in a general framework. Secondly, to reduce processing time, we design two schemes to construct a small candidate set and evaluate the criterion for images in the set instead of all the unlabeled images. Systematic experimental results demonstrate the superiority of our method over existing active learning methods.
Putting Active Learning into Multimedia Applications: Dynamic Definition and Refinement of Concept Classifiers
- In Proceedings of ACM Multimedia
, 2005
"... The authors developed an extensible system for video exploitation that puts the user in control to better accommodate novel situations and source material. Visually dense displays of thumbnail imagery in storyboard views are used for shot-based video exploration and retrieval. The user can identify ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The authors developed an extensible system for video exploitation that puts the user in control to better accommodate novel situations and source material. Visually dense displays of thumbnail imagery in storyboard views are used for shot-based video exploration and retrieval. The user can identify a need for a class of audiovisual detection, adeptly and fluently supply training material for that class, and iteratively evaluate and improve the resulting automatic classification produced via multiple modality active learning and SVM. By iteratively reviewing the output of the classifier and updating the positive and negative training samples with less effort than typical for relevance feedback systems, the user can play an active role in directing the classification process while still needing to truth only a very small percentage of the multimedia data set. Examples are given illustrating the iterative creation of a classifier for a concept of interest to be included in subsequent investigations, and for a concept typically deemed irrelevant to be weeded out in follow-up queries. Filtering and browsing tools making use of existing and iteratively added concepts put the user further in control of the multimedia browsing and retrieval process.
Curious Machines: Active Learning with Structured Instances
, 2008
"... and for Natalie, who now piques it. i ii Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks in ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
and for Natalie, who now piques it. i ii Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks involving the automatic organization, extraction, and retrieval of information from large collections of data (e.g., text, images, and other digital media). In traditional supervised learning, one uses “labeled ” training data to induce a model. However, labeled instances for real-world applications are often difficult, expensive, or time consuming to obtain. Consider a complex task such as extracting key person and organization names from text documents. While gathering large amounts of unlabeled documents for these tasks is often relatively easy (e.g., from the World Wide Web), labeling these texts usually requires experienced human annotators with specific domain knowledge and training. There are implicit costs associated with obtaining these labels from domain experts, such as limited time and financial resources. This
How to Select a Good Training-data Subset for Transcription: Submodular Active Selection for Sequences
"... Given a large un-transcribed corpus of speech utterances, we address the problem of how to select a good subset for wordlevel transcription under a given fixed transcription budget. We employ submodular active selection on a Fisher-kernel based graph over un-transcribed utterances. The selection is ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Given a large un-transcribed corpus of speech utterances, we address the problem of how to select a good subset for wordlevel transcription under a given fixed transcription budget. We employ submodular active selection on a Fisher-kernel based graph over un-transcribed utterances. The selection is theoretically guaranteed to be near-optimal. Moreover, our approach is able to bootstrap without requiring any initial transcribed data, whereas traditional approaches rely heavily on the quality of an initial model trained on some labeled data. Our experiments on phone recognition show that our approach outperforms both average-case random selection and uncertainty sampling significantly.
Active Learning for Networked Data
"... We introduce a novel active learning algorithm for classification of network data. In this setting, training instances are connected by a set of links to form a network, the labels of linked nodes are correlated, and the goal is to exploit these dependencies and accurately label the nodes. This prob ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We introduce a novel active learning algorithm for classification of network data. In this setting, training instances are connected by a set of links to form a network, the labels of linked nodes are correlated, and the goal is to exploit these dependencies and accurately label the nodes. This problem arises in many domains, including social and biological network analysis and document classification, and there has been much recent interest in methods that collectively classify the nodes in the network. While in many cases labeled examples are expensive, often network information is available. We show how an active learning algorithm can take advantage of network structure. Our algorithm effectively exploits the links between instances and the interaction between the local and collective aspects of a classifier to improve the accuracy of learning from fewer labeled examples. We experiment with two real-world benchmark collective classification domains, and show that we are able to achieve extremely accurate results even when only a small fraction of the data is labeled. 1.
Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance ABSTRACT
"... This paper analyses alternative techniques for deploying lowcost human resources for data acquisition for classifier induction in domains exhibiting extreme class imbalance—where traditional labeling strategies, such as active learning, can be ineffective. Consider the problem of building classifier ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper analyses alternative techniques for deploying lowcost human resources for data acquisition for classifier induction in domains exhibiting extreme class imbalance—where traditional labeling strategies, such as active learning, can be ineffective. Consider the problem of building classifiers to help brands control the content adjacent to their on-line advertisements. Although frequent enough to worry advertisers, objectionable categories are rare in the distribution of impressions encountered by most on-line advertisers—so rare that traditional sampling techniques do not find enough positive examples to train effective models. An alternative way to deploy human resources for training-data acquisition is to have them “guide ” the learning by searching explicitly for training examples of each class. We show that under extreme skew, even basic techniques for guided learning completely dominate smart (active) strategies for applying human resources to select cases for labeling. Therefore, it is critical to consider the relative cost of search versus labeling, and we demonstrate the tradeoffs for different relative costs. We show that in cost/skew settings where the choice between search and active labeling is equivocal, a hybrid strategy can combine the benefits.
Interactive search by direct manipulation of dissimilarity space
- IEEE Transactions on Multimedia. VOL. 9, NO
, 2007
"... Abstract—In this paper, we argue to learn dissimilarity for interactive search in content based image retrieval. In literature, dissimilarity is often learned via the feature space by feature selection, feature weighting or by adjusting the parameters of a function of the features. Other than existi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—In this paper, we argue to learn dissimilarity for interactive search in content based image retrieval. In literature, dissimilarity is often learned via the feature space by feature selection, feature weighting or by adjusting the parameters of a function of the features. Other than existing techniques, we use feedback to adjust the dissimilarity space independent of feature space. This has the great advantage that it manipulates dissimilarity directly. To create a dissimilarity space, we use the method proposed by Pekalska and Duin, selecting a set of images called prototypes and computing distances to those prototypes for all images in the collection. After the user gives feedback, we apply active learning with a one-class support vector machine to decide the movement of images such that relevant images stay close together while irrelevant ones are pushed away (the work of Guo et al.). The dissimilarity space is then adjusted accordingly. Results on a Corel dataset of 10000 images and a TrecVid collection of 43907 keyframes show that our proposed approach is not only intuitive, it also significantly improves the retrieval performance. Index Terms—Active learning, dissimilarity learning, interactive image search, visualization. I.

