Results 1 - 10
of
215
Practical bayesian optimization of machine learning algorithms
, 2012
"... In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at ..."
Abstract
-
Cited by 130 (16 self)
- Add to MetaCart
(Show Context)
In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at
MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification
"... Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a max-margin supervised topic model for both continuous and categorical response variab ..."
Abstract
-
Cited by 93 (27 self)
- Add to MetaCart
(Show Context)
Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a max-margin supervised topic model for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the max-margin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction. We develop efficient variational methods for posterior inference and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihood-based topic models on movie review and 20 Newsgroups data sets. 1.
Latent Hierarchical Structural Learning for Object Detection
, 2010
"... We present a latent hierarchical structural learning method for object detection. An object is represented by a mixture of hierarchical tree models where the nodes represent object parts. The nodes can move spatially to allow both local and global shape deformations. The models can be trained discri ..."
Abstract
-
Cited by 87 (7 self)
- Add to MetaCart
We present a latent hierarchical structural learning method for object detection. An object is represented by a mixture of hierarchical tree models where the nodes represent object parts. The nodes can move spatially to allow both local and global shape deformations. The models can be trained discriminatively using latent structural SVM learning, where the latent variables are the node positions and the mixture component. But current learning methods are slow, due to the large number of parameters and latent variables, and have been restricted to hierarchies with two layers. In this paper we describe an incremental concaveconvex procedure (iCCCP) which allows us to learn both two and three layer models efficiently. We show that iCCCP leads to a simple training algorithm which avoids complex multi-stage layer-wise training, careful part selection, and achieves good performance without requiring elaborate initialization. We perform object detection using our learnt models and obtain performance comparable with state-ofthe-art methods when evaluated on challenging public PAS-CAL datasets. We demonstrate the advantages of three layer hierarchies – outperforming Felzenszwalb et al.’s two layer models on all 20 classes.
Minimal Loss Hashing for Compact Binary Codes
"... We propose a method for learning similaritypreserving hash functions that map highdimensional data onto binary codes. The formulation is based on structured prediction with latent variables and a hinge-like loss function. It is efficient to train for large datasets, scales well to large code lengths ..."
Abstract
-
Cited by 81 (3 self)
- Add to MetaCart
We propose a method for learning similaritypreserving hash functions that map highdimensional data onto binary codes. The formulation is based on structured prediction with latent variables and a hinge-like loss function. It is efficient to train for large datasets, scales well to large code lengths, and outperforms state-of-the-art methods. 1.
Learning latent temporal structure for complex event detection
- In CVPR
, 2012
"... In this paper, we tackle the problem of understanding the temporal structure of complex events in highly varying videos obtained from the Internet. Towards this goal, we utilize a conditional model trained in a max-margin framework that is able to automatically discover discriminative and interestin ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
(Show Context)
In this paper, we tackle the problem of understanding the temporal structure of complex events in highly varying videos obtained from the Internet. Towards this goal, we utilize a conditional model trained in a max-margin framework that is able to automatically discover discriminative and interesting segments of video, while simultaneously achieving competitive accuracies on difficult detection and recognition tasks. We introduce latent variables over the frames of a video, and allow our algorithm to discover and assign sequences of states that are most discriminative for the event. Our model is based on the variable-duration hidden Markov model, and models durations of states in addition to the transitions between states. The simplicity of our model allows us to perform fast, exact inference using dynamic programming, which is extremely important when we set our sights on being able to process a very large number of videos quickly and efficiently. We show promising results on the Olympic Sports dataset [16] and the 2011 TRECVID Multimedia Event Detection task [18]. We also illustrate and visualize the semantic understanding capabilities of our model. 1.
An Introduction to Conditional Random Fields
- Foundations and Trends in Machine Learning
, 2012
"... ..."
(Show Context)
Learning human activities and object affordances from rgb-d videos. IJRR
, 2013
"... such as making cereal and arranging objects in a room (see Fig. 9). For example, the making cereal activity consists of around 12 sub-activities on average, which includes reaching the pitcher, moving the pitcher to the bowl, and then pouring the milk into the bowl. This proves to be a very challeng ..."
Abstract
-
Cited by 59 (16 self)
- Add to MetaCart
(Show Context)
such as making cereal and arranging objects in a room (see Fig. 9). For example, the making cereal activity consists of around 12 sub-activities on average, which includes reaching the pitcher, moving the pitcher to the bowl, and then pouring the milk into the bowl. This proves to be a very challenging task given the variability across individuals in performing each sub-activity, and other environment induced conditions such as cluttered background and viewpoint changes. (See Fig. 2 for some examples.) In most previous works, object detection and activity recognition have been addressed as separate tasks. Only recently, some works have shown that modeling mutual context is beneficial (Gupta et al., 2009; Yao and Fei-Fei, 2010). The key idea in our work is to note that, in activity detection, it is sometimes more informative to know how an object is being used (associated affordances, Gibson, 1979) rather than knowing what the object is (i.e. the object category). For example, both chair and sofa might be categorized as ‘sittable, ’ and a cup might be categorized as both ‘drinkable ’ and ‘pourable. ’ Note that the affordances of an object change over time depending on its use, e.g., a pitcher may first be reachable, then movable and finally pourable. In addition to helping activity recognition, recognizing object affordances is important by itself because of their use in robotic applications (e.g., Kormushev et al., 2010; Jiang et al., 2012a; Jiang and Saxena, 2012). We propose a method to learn human activities by modarXiv:1210.1207v2
Object detection with grammar models
- In NIPS
, 2011
"... Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develo ..."
Abstract
-
Cited by 59 (4 self)
- Add to MetaCart
(Show Context)
Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1
Recognizing human actions in still images: a study of bag-of-features and part-based representations
, 2010
"... ..."
Multiresolution models for object detection
"... Abstract. Most current approaches to recognition aim to be scaleinvariant. However, the cues available for recognizing a 300 pixel tall object are qualitatively different from those for recognizing a 3 pixel tall object. We argue that for sensors with finite resolution, one should instead use scale- ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
(Show Context)
Abstract. Most current approaches to recognition aim to be scaleinvariant. However, the cues available for recognizing a 300 pixel tall object are qualitatively different from those for recognizing a 3 pixel tall object. We argue that for sensors with finite resolution, one should instead use scale-variant, or multiresolution representations that adapt in complexity to the size of a putative detection window. We describe a multiresolution model that acts as a deformable part-based model when scoring large instances and a rigid template with scoring small instances. We also examine the interplay of resolution and context, and demonstrate that context is most helpful for detecting low-resolution instances when local models are limited in discriminative power. We demonstrate impressive results on the Caltech Pedestrian benchmark, which contains object instances at a wide range of scales. Whereas recent state-of-theart methods demonstrate missed detection rates of 86%-37 % at 1 falsepositive-per-image, our multiresolution model reduces the rate to 29%. 1