Results 1 - 10
of
24
B.: Learning realistic human actions from movies
- In: CVPR. (2008
"... The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribut ..."
Abstract
-
Cited by 143 (16 self)
- Add to MetaCart
The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multichannel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8 % accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results. 1.
Imagenet: A large-scale hierarchical image database
- In CVPR
, 2009
"... The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce her ..."
Abstract
-
Cited by 110 (7 self)
- Add to MetaCart
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a largescale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond. 1.
1 Multi-Level Active Prediction of Useful Image Annotations for Recognition
, 2008
"... We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples. We propose to allow the categorylearner to strategically choose what annotations it receives—based on both the expected reduction in uncertainty as well as the relative cost ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples. We propose to allow the categorylearner to strategically choose what annotations it receives—based on both the expected reduction in uncertainty as well as the relative costs of obtaining each annotation. We construct a multiple-instance discriminative classifier based on the initial training data. Then all remaining unlabeled and weakly labeled examples are surveyed to actively determine which annotation ought to be requested next. After each request, the current classifier is incrementally updated. Unlike previous work, our approach accounts for the fact that the optimal use of manual annotation may call for a combination of labels at multiple levels of granularity (e.g., a full segmentation on some images and a present/absent flag on others). As a result, it is possible to learn more accurate category models with a lower total expenditure of manual annotation effort. 1
Building text features for object image classifications
- In CVPR, 2009. 124
"... We introduce a text-based image feature and demonstrate that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the internet. We do not inspect or correct the tags and expect that ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We introduce a text-based image feature and demonstrate that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the internet. We do not inspect or correct the tags and expect that they are noisy. We obtain the text feature of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. Our text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. We test the performance of this feature using PAS-CAL VOC 2006 and 2007 datasets. Our feature performs well, consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small.
PLDA: Parallel Latent Dirichlet Allocation for Large-scale Applications
"... Abstract. This paper presents PLDA, our parallel implementation of Latent Dirichlet Allocation on MPI and MapReduce. PLDA smooths out storage and computation bottlenecks and provides fault recovery for lengthy distributed computations. We show that PLDA can be applied to large, real-world applicatio ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Abstract. This paper presents PLDA, our parallel implementation of Latent Dirichlet Allocation on MPI and MapReduce. PLDA smooths out storage and computation bottlenecks and provides fault recovery for lengthy distributed computations. We show that PLDA can be applied to large, real-world applications and achieves good scalability. We have released MPI-PLDA to open source at http://code.google.com/p/plda under the Apache License. 1
Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization
"... Conventional supervised methods for image categorization rely on manually annotated (labeled) examples to learn good object models, which means their generality and scalability depends heavily on the amount of human effort available to help train them. We propose an unsupervised approach to construc ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Conventional supervised methods for image categorization rely on manually annotated (labeled) examples to learn good object models, which means their generality and scalability depends heavily on the amount of human effort available to help train them. We propose an unsupervised approach to construct discriminative models for categories specified simply by their names. We show that multiple-instance learning enables the recovery of robust category models from images returned by keyword-based search engines. By incorporating constraints that reflect the expected sparsity of true positive examples into a large-margin objective function, our approach remains accurate even when the available text annotations are imperfect and ambiguous. In addition, we show how to iteratively improve the learned classifier by automatically refining the representation of the ambiguously labeled examples. We demonstrate our method with benchmark datasets, and show that it performs well relative to both state-of-the-art unsupervised approaches and traditional fully supervised techniques. 1.
Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines
"... Measuring image similarity is a central topic in computer vision. In this paper, we learn similarity from Flickr groups and use it to organize photos. Two images are similar if they are likely to belong to the same Flickr groups. Our approach is enabled by a fast Stochastic Intersection Kernel MAchi ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Measuring image similarity is a central topic in computer vision. In this paper, we learn similarity from Flickr groups and use it to organize photos. Two images are similar if they are likely to belong to the same Flickr groups. Our approach is enabled by a fast Stochastic Intersection Kernel MAchine (SIKMA) training algorithm, which we propose. This proposed training method will be useful for many vision problems, as it can produce a classifier that is more accurate than a linear classifier, trained on tens of thousands of examples in two minutes. The experimental results show our approach performs better on image matching, retrieval, and classification than using conventional visual features. 1.
Classifier Grids for Robust Adaptive Object Detection
"... In this paper we present an adaptive but robust object detector for static cameras by introducing classifier grids. Instead of using a sliding window for object detection we propose to train a separate classifier for each image location, obtaining a very specific object detector with a low false ala ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper we present an adaptive but robust object detector for static cameras by introducing classifier grids. Instead of using a sliding window for object detection we propose to train a separate classifier for each image location, obtaining a very specific object detector with a low false alarm rate. For each classifier corresponding to a grid element we estimate two generative representations in parallel, one describing the object’s class and one describing the background. These are combined in order to obtain a discriminative model. To enable to adapt to changing environments these classifiers are learned on-line (i.e., boosting). Continuously learning (24 hours a day, 7 days a week) requires a stable system. In our method this is ensured by a fixed object representation while updating only the representation of the background. We demonstrate the stability in a long-term experiment by running the system for a whole week, which shows a stable performance over time. In addition, we compare the proposed approach to state-of-the-art methods in the field of person and car detection. In both cases we obtain competitive results. 1.
Improving Activity Classification for Health Applications on Mobile Devices using Active and Semi-Supervised Learning
"... Abstract—Mobile phones ’ increasing ubiquity has created many opportunities for personal context sensing. Personal activity is an important part of a user’s context, and automatically recognizing it is vital for health and fitness monitoring applications. Recording a stream of activity data enables ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—Mobile phones ’ increasing ubiquity has created many opportunities for personal context sensing. Personal activity is an important part of a user’s context, and automatically recognizing it is vital for health and fitness monitoring applications. Recording a stream of activity data enables monitoring patients with chronic conditions affecting ambulation and motion, as well as those undergoing rehabilitation treatments. Modern mobile phones are powerful enough to perform activity classification in real time, but they typically use a static classifier that is trained in advance or require the user to manually add training data after the application is on his/her device. This paper investigates ways of automatically augmenting activity classifiers after they are deployed in an application. It compares active learning and three different semi-supervised learning methods, self-learning, En-Co-Training, and democratic co-learning, to determine which show promise for this purpose. The results show that active learning, En-Co-Training, and democratic co-learning perform well when the initial classifier’s accuracy is low (75-80%). When the initial accuracy is already high (90%), these methods are no longer effective, but they do not hurt the accuracy either. Overall, active learning gave the highest improvement, but democratic colearning was almost as good and does not require user interaction. Thus, democratic co-learning would be the best choice for most applications, since it would significantly increase the accuracy for initial classifiers that performed poorly. I.
Geometric lda: A generative model for particular object discovery
- In Proceedings of the British Machine Vision Conference
, 2008
"... Automatically organizing collections of images presents serious challenges to the current state-of-the art methods in image data mining. Often, what is required is that images taken in the same place, of the same thing, or of the same person be conceptually grouped together. To achieve this, we intr ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Automatically organizing collections of images presents serious challenges to the current state-of-the art methods in image data mining. Often, what is required is that images taken in the same place, of the same thing, or of the same person be conceptually grouped together. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised particular object discovery in unordered image collections. This explicitly represents documents as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked. We demonstrate the model on a publicly available dataset of Oxford images, and show examples of spatially consistent groupings. 1

