• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Adapting visual category models to new domains. In ECCV. (2010)

by K Saenko
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 163
Next 10 →

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

by Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell
"... We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be repurposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks an ..."
Abstract - Cited by 203 (22 self) - Add to MetaCart
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be repurposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a variety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed feature, and report novel results that significantly outperform the state-of-the-art on several important vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms. 1.
(Show Context)

Citation Context

...en a user is defining a category “on-the-fly” using specific examples, or for fine-grained recognition challenges (Welinder et al., 2010), attributes (Bourdev et al., 2011), and/or domain adaptation (=-=Saenko et al., 2010-=-). In this paper we investigate semi-supervised multi-taskDeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition learning of deep convolutional representations, where represent...

What you saw is not what you get: Domain adaptation using asymmetric kernel transforms

by Brian Kulis, Kate Saenko, Trevor Darrell - In Proc. of CVPR , 2011
"... In real-world applications, “what you saw ” during training is often not “what you get ” during deployment: the distribution and even the type and dimensionality of features can change from one dataset to the next. In this paper, we address the problem of visual domain adaptation for transferring ob ..."
Abstract - Cited by 111 (15 self) - Add to MetaCart
In real-world applications, “what you saw ” during training is often not “what you get ” during deployment: the distribution and even the type and dimensionality of features can change from one dataset to the next. In this paper, we address the problem of visual domain adaptation for transferring object models from one dataset or visual domain to another. We introduce ARC-t, a flexible model for supervised learning of non-linear transformations between domains. Our method is based on a novel theoretical result demonstrating that such transformations can be learned in kernel space. Unlike existing work, our model is not restricted to symmetric transformations, nor to features of the same type and dimensionality, making it applicable to a significantly wider set of adaptation scenarios than previous methods. Furthermore, the method can be applied to categories that were not available during training. We demonstrate the ability of our method to adapt object recognition models under a variety of situations, such as differing imaging conditions, feature types and codebooks. 1.
(Show Context)

Citation Context

...st camera, flash consumer images Figure 1. We address the problem of adapting object models trained on a particular source dataset, or domain (left), to a target domain (right). Recently, the work of =-=[19, 21, 12]-=- examined the domain adaptation problem for computer vision tasks, such as video concept detection and visual object modeling. In particular, [19] learned a domain-invariant distance metric using a sm...

Geodesic flow kernel for unsupervised domain adaptation

by Boqing Gong, Yuan Shi, Fei Sha, Kristen Grauman - In CVPR , 2012
"... In real-world applications of visual recognition, many factors—such as pose, illumination, or image quality—can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform ..."
Abstract - Cited by 97 (6 self) - Add to MetaCart
In real-world applications of visual recognition, many factors—such as pose, illumination, or image quality—can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform poorly on the target domain. Domain adaptation techniques aim to correct the mismatch. Existing approaches have concentrated on learning feature representations that are invariant across domains, and they often do not directly exploit low-dimensional structures that are intrinsic to many vision datasets. In this paper, we propose a new kernel-based method that takes advantage of such structures. Our geodesic flow kernel models domain shift by integrating an infinite number of subspaces that characterize changes in geometric and statistical properties from the source to the target domain. Our approach is computationally advantageous, automatically inferring important algorithmic parameters without requiring extensive crossvalidation or labeled data from either domain. We also introduce a metric that reliably measures the adaptability between a pair of source and target domains. For a given target domain and several source domains, the metric can be used to automatically select the optimal source domain to adapt and avoid less desirable ones. Empirical studies on standard datasets demonstrate the advantages of our approach over competing methods. 1.
(Show Context)

Citation Context

...omain adaptation has been extensively studied in many areas, including in statistics and machine learning [26, 18, 2, 23], speech and language processing [7, 5, 21], and more recently computer vision =-=[3, 14, 25, 20]-=-. Of particular relevance to our work is the idea of learning new feature representations that are domain-invariant, thus enabling transferring classifiers from the source domain to the target domain ...

Visual event recognition in videos by learning from web data

by Lixin Duan, Dong Xu, Ivor W. Tsang, Jiebo Luo - In CVPR. IEEE , 2010
"... We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is di ..."
Abstract - Cited by 84 (16 self) - Add to MetaCart
We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pair-wise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover’s Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web domain and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a cross-domain learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. 1.
(Show Context)

Citation Context

...(a.k.a., domain adaptation or cross-domain learning) has been studied for years in other fields (e.g., natural language processing [1], [6]), it is still an emerging research topic in computer vision =-=[40]-=-. In some vision applications, there is an existing domain (i.e., auxiliary domain) with a large number of labeled data, but we want to recognize the images or videos in another domain of interest (i....

Learning and transferring mid-level image representations using convolutional neural networks

by Maxime Oquab, Leon Bottou Ivan Laptev, Josef Sivic - In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR , 2014
"... Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The suc-cess of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level f ..."
Abstract - Cited by 71 (3 self) - Add to MetaCart
Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The suc-cess of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification meth-ods. Learning CNNs, however, amounts to estimating mil-lions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be effi-ciently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred rep-resentation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization. 1.
(Show Context)

Citation Context

...e. Will we need to collect millions of annotated images for each new visual recognition task in the future? It has been argued that computer vision datasets have significant differences in image statistics [50]. For example, while objects are typically centered in Caltech256 and ImageNet datasets, other datasets such as Pascal VOC and LabelMe are more likely to contain objects embedded in a scene (see Figure 3). Differences in viewpoints, scene context, “background” (negative class) and other factors, inevitably affect recognition performance when training and testing across different domains [38, 42, 50]. Similar phenomena have been observed in other areas such as NLP [22]. Given the “data-hungry” nature of CNNs and the difficulty of collecting large-scale image datasets, the applicability of CNNs to tasks with limited amount of training data appears as an important open problem. To address this problem, we propose to transfer image representations learned with CNNs on large datasets to other visual recognition tasks with limited training data. In particular, we design a method that uses ImageNet-trained layers of CNN to compute efficient mid-level image representation for images in Pascal VO...

Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

by Alessandro Bergamo, Lorenzo Torresani
"... Most current image categorization methods require large collections of manually annotated training examples to learn accurate visual recognition models. The time-consuming human labeling effort effectively limits these approaches to recognition problems involving a small number of different object c ..."
Abstract - Cited by 61 (0 self) - Add to MetaCart
Most current image categorization methods require large collections of manually annotated training examples to learn accurate visual recognition models. The time-consuming human labeling effort effectively limits these approaches to recognition problems involving a small number of different object classes. In order to address this shortcoming, in recent years several authors have proposed to learn object classifiers from weakly-labeled Internet images, such as photos retrieved by keyword-based image search engines. While this strategy eliminates the need for human supervision, the recognition accuracies of these methods are considerably lower than those obtained with fully-supervised approaches, because of the noisy nature of the labels associated to Web data. In this paper we investigate and compare methods that learn image classifiers by combining very few manually annotated examples (e.g., 1-10 images per class) and a large number of weakly-labeled Web photos retrieved using keyword-based image search. We cast this as a domain adaptation problem: given a few stronglylabeled examples in a target domain (the manually annotated examples) and many source domain examples (the weakly-labeled Web photos), learn classifiers yielding small generalization error on the target domain. Our experiments demonstrate that, for the same number of strongly-labeled examples, our domain adaptation approach produces significant recognition rate improvements over the best published results (e.g., 65 % better when using 5 labeled training examples per class) and that our classifiers are one order of magnitude faster to learn and to evaluate than the best competing method, despite our use of large weakly-labeled data sets. 1
(Show Context)

Citation Context

...ion methods to address sample distribution differences in object categorization due to the use of weakly-labeled Web images as training data. We note that in work concurrent to our own, Saenko et al. =-=[24]-=- have also analyzed cross-domain adaptation of object classifiers. However, their work focuses on the statistical differences caused by varying lighting conditions (uncontrolled versus studio setups) ...

Tabula rasa: Model transfer for object category detection

by Yusuf Aytar, Andrew Zisserman - In Proc. ICCV , 2011
"... Our objective is transfer training of a discriminatively trained object category detector, in order to reduce the number of training images required. To this end we propose three transfer learning formulations where a template learnt previously for other categories is used to regularize the training ..."
Abstract - Cited by 56 (1 self) - Add to MetaCart
Our objective is transfer training of a discriminatively trained object category detector, in order to reduce the number of training images required. To this end we propose three transfer learning formulations where a template learnt previously for other categories is used to regularize the training of a new category. All the formulations result in convex optimization problems. Experiments (on PASCAL VOC) demonstrate significant performance gains by transfer learning from one class to another (e.g. motorbike to bicycle), including one-shot learning, specialization from class to a subordinate class (e.g. from quadruped to horse) and transfer using multiple components. In the case of multiple training samples it is shown that a detection performance approaching that of the state of the art can be achieved with substantially fewer training samples. 1.
(Show Context)

Citation Context

...t al. [21] consider a more geometric based transfer between models, though this is manual at the moment. There is another school of transfer learning where classifiers are transferred between domains =-=[3, 4, 20, 28]-=-, for example by learning feature distributions for the source and target domains, but we are not concerned with this type of domain transfer problem here. In the next section we define the problem, a...

Marginalized Denoising Autoencoders for Domain Adaptation

by Minmin Chen, Zhixiang (eddie Xu, Kilian Q. Weinberger, Fei Sha
"... Stacked denoising autoencoders (SDAs) have been successfully used to learn new representations for domain adaptation. Recently, they have attained record accuracy on standard benchmark tasks of sentiment analysis across different text domains. SDAs learn robust data representations by reconstruction ..."
Abstract - Cited by 46 (11 self) - Add to MetaCart
Stacked denoising autoencoders (SDAs) have been successfully used to learn new representations for domain adaptation. Recently, they have attained record accuracy on standard benchmark tasks of sentiment analysis across different text domains. SDAs learn robust data representations by reconstruction, recovering original features from data that are artificially corrupted with noise. In this paper, we propose marginalized SDA (mSDA) that addresses two crucial limitations of SDAs: high computational cost and lack of scalability to high-dimensional features. In contrast to SDAs, our approach of mSDA marginalizes noise and thus does not require stochastic gradient descent or other optimization algorithms to learn parameters — in fact, they are computed in closed-form. Consequently, mSDA, which can be implemented in only 20 lines of MATLABTM, significantly speeds up SDAs by two orders of magnitude. Furthermore, the representations learnt by mSDA are as effective as the traditional SDAs, attaining almost identical accuracies in benchmark tasks.
(Show Context)

Citation Context

..., 2012. Copyright 2012 by the author(s)/owner(s). cur. Examples are computational biology (Liu et al., 2008), natural language processing (Daume III, 2007; McClosky et al., 2006) and computer vision (=-=Saenko et al., 2010-=-). Data in the source and the target are often distributed differently. This presents a major obstacle in adapting predictive models. Recent work has investigated several techniques for alleviating th...

Undoing the damage of dataset bias

by Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei Efros, Antonio Torralba , 2012
"... The presence of bias in existing object recognition datasets is now well-known in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset b ..."
Abstract - Cited by 35 (3 self) - Add to MetaCart
The presence of bias in existing object recognition datasets is now well-known in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset bias during training. In particular, our model learns two sets of weights: (1) bias vectors associated with each individual dataset, and (2) visual world weights that are common to all datasets, which are learned by undoing the associated bias from each dataset. The visual world weights are expected to be our best possible approximation to the object model trained on an unbiased dataset, and thus tend to have good generalization ability. We demonstrate the e↵ectiveness of our model by applying the learned weights to a novel, unseen dataset, and report superior results for both classification and detection tasks compared to a classical SVM that does not account for the presence of bias. Overall, we find that it is beneficial to explicitly account for bias when combining multiple datasets.
(Show Context)

Citation Context

...ecognition problems. This line of research addresses the problem of domain shift [6], i.e. mismatch of the joint distribution of inputs between source and target domains. In particular, Saenko et al. =-=[7]-=- provide one of the first studies of domain adaptation for object recognition. The key idea of their work is to learn a regularized transformation using information-theoretic metric learning that maps...

A survey on metric learning for feature vectors and structured data

by A. Bellet, Amaury Habrard, Marc Sebban , 2014
"... ..."
Abstract - Cited by 35 (2 self) - Add to MetaCart
Abstract not found
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University