• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Large margin training for hidden markov models with partially observed states. In: ICML. (2009)

by T M T Do, T Artieres
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 42
Next 10 →

Recognizing Human Actions from Still Images with Latent Poses

by Weilong Yang, Yang Wang, Greg Mori
"... We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, ..."
Abstract - Cited by 86 (7 self) - Add to MetaCart
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results. 1.
(Show Context)

Citation Context

...rs. Those SVM classifiers are learned using the ground-truth information of the pose on the training data. The training problem in Eqn. (10) can be solved by the non-convex cutting plane algorithm in =-=[5]-=-, which is an extension of the popular convex cutting plane algorithm [12] for learning structural SVM [1]. We briefly outline the algorithm here. Consider the following unconstrained formulation whic...

A Discriminative Latent Model of Object Classes and Attributes

by Yang Wang, Greg Mori
"... Abstract. We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations ..."
Abstract - Cited by 81 (5 self) - Add to MetaCart
Abstract. We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to infer object class labels using the information of both the test image itself and its (latent) attributes. Our model unifies object class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental results provide quantitative evidence that attributes can improve object naming. 1
(Show Context)

Citation Context

...ure work. 4 Non-Convex Cutting Plane Training The optimization problem in Eq. (3) can be solved in many different ways. In our implementation, we adopt a non-convex cutting plane method proposed9 in =-=[4]-=- due to its ease of use. First, it is easy to shown that Eq. (3) is equivalent to minw L(w) = β||w|| 2 + ∑N n=1 Rn (w) where Rn (w) is a hinge loss function defined as: R n ( (w) = max ∆(y, y y (n) ) ...

Beyond actions: Discriminative models for contextual group activities

by Tian Lan, Weilong Yang, Yang Wang, Greg Mori - In Advances in Neural Information Processing Systems , 2010
"... We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent ..."
Abstract - Cited by 46 (6 self) - Add to MetaCart
We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1
(Show Context)

Citation Context

...1 2 ||w||2 + C where L n = max y N∑ (L n − R n ) (12a) n=1 max hy max Gy (∆(y, y n ) + fw(x n , hy, y; Gy)), R n = max fw(x n , h n , y n ; Gyn)(12b) Gy n We use the non-convex bundle optimization in =-=[7]-=- to solve Eq. 12. In a nutshell, the algorithm iteratively builds an increasingly accurate piecewise quadratic approximation to the objective function. During each iteration, a new linear cutting plan...

Discriminative Latent Models for Recognizing Contextual Group Activities

by Tian Lan, Yang Wang, Weilong Yang, Stephen N. Robinovitch, Greg Mori - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL., NO. 1
"... In this paper, we go beyond recognizing the actions of individuals and focus on group activities. This is motivated from the observation that human actions are rarely performed in isolation, the contextual information of what other people in the scene are doing provides a useful cue for understandin ..."
Abstract - Cited by 38 (5 self) - Add to MetaCart
In this paper, we go beyond recognizing the actions of individuals and focus on group activities. This is motivated from the observation that human actions are rarely performed in isolation, the contextual information of what other people in the scene are doing provides a useful cue for understanding high-level activities. We propose a novel framework for recognizing group activities which jointly captures the group activity, the individual person actions, and the interactions among them. Two types of contextual information: group-person interaction and person-person interaction, are explored in a latent variable framework. In particular, we propose three different approaches to model the person-person interaction. One approach is to explore the structures of person-person interaction. Different from most of the previous latent structured models which assume a pre-defined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. The second approach explores person-person interaction in the feature level. We introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behaviour of other people nearby. The third approach combines the above two. Our experimental results demonstrate the benefit of using contextual information for disambiguating group activities.

M.: Recognizing complex events using large margin joint lowlevel event model. In: ECCV

by Hamid Izadinia, Mubarak Shah , 2012
"... Abstract. In this paper we address the challenging problem of complex event recognition by using low-level events. In this problem, each com-plex event is captured by a long video in which several low-level events happen. The dataset contains several videos and due to the large num-ber of videos and ..."
Abstract - Cited by 27 (4 self) - Add to MetaCart
Abstract. In this paper we address the challenging problem of complex event recognition by using low-level events. In this problem, each com-plex event is captured by a long video in which several low-level events happen. The dataset contains several videos and due to the large num-ber of videos and complexity of the events, the available annotation for the low-level events is very noisy which makes the detection task even more challenging. To tackle these problems we model the joint relation-ship between the low-level events in a graph where we consider a node for each low-level event and whenever there is a correlation between two low-level events the graph has an edge between the corresponding nodes. In addition, for decreasing the effect of weak and/or irrelevant low-level event detectors we consider the presence/absence of low-level events as hidden variables and learn a discriminative model by using latent SVM formulation. Using our learned model for the complex event recognition, we can also apply it for improving the detection of the low-level events in video clips which enables us to discover a conceptual description of the video. Thus our model can do complex event recognition and explain a video in terms of low-level events in a single framework. We have eval-uated our proposed method over the most challenging multimedia event detection dataset. The experimental results reveals that the proposed method performs well compared to the baseline method. Further, our re-sults of conceptual description of video shows that our model is learned quite well to handle the noisy annotation and surpass the low-level event detectors which are directly trained on the raw features. 1
(Show Context)

Citation Context

...Ri = G(xi, z∗y∗ , y∗, Θ) +∆(y∗, yi)− G(xi, z∗yi , yi, Θ). (6) Apparently, the risk function is non-zero if y∗ 6= yi. We minimize the objective function f(Θ) using non-convex regularized bundle method =-=[24]-=-. This method relies on the cutting plane technique, where a cutting plane in defined using the sub-gradient of objective function f(Θ) by δΘf = λΘ + n∑ i=1 ( Φ(xi, z ∗ y∗ , y ∗)− Φ(xi, z∗yi , yi) ) ....

Actively Selecting Annotations Among Objects and Attributes

by Adriana Kovashka, Sudheendra Vijayanarasimhan, Kristen Grauman
"... We present an active learning approach to choose image annotation requests among both object category labels and the objects ’ attribute labels. The goal is to solicit those labels that will best use human effort when training a multiclass object recognition model. In contrast to previous work in ac ..."
Abstract - Cited by 21 (2 self) - Add to MetaCart
We present an active learning approach to choose image annotation requests among both object category labels and the objects ’ attribute labels. The goal is to solicit those labels that will best use human effort when training a multiclass object recognition model. In contrast to previous work in active visual category learning, our approach directly exploits the dependencies between human-nameable visual attributes and the objects they describe, shifting its requests in either label space accordingly. We adopt a discriminative latent model that captures object-attribute and attribute-attribute relationships, and then define a suitable entropy reduction selection criterion to predict the influence a new label might have throughout those connections. On three challenging datasets, we demonstrate that the method can more successfully accelerate object learning relative to both passive learning and traditional active learning approaches. 1.
(Show Context)

Citation Context

...parately trained traditional SVMs to produce feature values, which are then weighted by the learned parameters wy and whj . To train the model (learn w), we use the non-convex cutting plane method of =-=[4]-=-, which allows latent attribute labels for the training examples. We use a mixture of observed and latent attribute labels when dealing with partially labeled examples during the active learning loop ...

A Discriminative Latent Model of Image Region and Object Tag Correspondence

by Yang Wang, Greg Mori
"... We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and ..."
Abstract - Cited by 16 (6 self) - Add to MetaCart
We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods. 1
(Show Context)

Citation Context

...tly written as an unconstrained problem: min θ 1 2 ||θ||2 + C N∑ (L n=1 n − R n ), where L n = max y ( ∆(y,y n ) + fθ(x n ) ,y) , R n = fθ(x n ,y n ) (15) We use the non-convex bundle optimization in =-=[5]-=- to solve (15). In a nutshell, the algorithm iteratively builds an increasingly accurate piecewise quadratic approximation to the objective function. During each iteration, a new linear cutting plane ...

Multi-View Latent Variable Discriminative Models For Action Recognition

by Yale Song, All Davis
"... Many human action recognition tasks involve data that can be factorized into multiple views such as body postures and hand shapes. These views often interact with each other over time, providing important cues to understanding the action. We present multi-view latent variable discriminative models t ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Many human action recognition tasks involve data that can be factorized into multiple views such as body postures and hand shapes. These views often interact with each other over time, providing important cues to understanding the action. We present multi-view latent variable discriminative models that jointly learn both view-shared and viewspecific sub-structures to capture the interaction between views. Knowledge about the underlying structure of the data is formulated as a multi-chain structured latent conditional model, explicitly learning the interaction between multiple views using disjoint sets of hidden variables in a discriminative manner. The chains are tied using a predetermined topology that repeats over time. We present three topologies – linked, coupled, and linked-coupled – that differ in the type of interaction between views that they model. We evaluate our approach on both segmented and unsegmented human action recognition tasks, using the ArmGesture, the NATOPS, and the ArmGesture-Continuous data. Experimental results show that our approach outperforms previous state-of-the-art action recognition models. 1.
(Show Context)

Citation Context

...s [19]), introducing hidden variables in Equation 7 makes our objective function non-convex. We find the optimal parameters Λ ∗ using the recently proposed non-convex regularized bundle method (NRBM) =-=[7]-=-, which has been proven to converge to a solution with an accuracy ɛ at the rate O(1/ɛ). The method aims at iteratively building an increasingly accurate piecewise quadratic lower bound of L(Λ) based ...

G.: Latent maximum margin clustering

by Guang-tong Zhou, Tian Lan, Arash Vahdat, Greg Mori , 2013
"... We present a maximum margin framework that clusters data using latent vari-ables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learn-ing, and develop an alternating descent algorithm to effectively solv ..."
Abstract - Cited by 9 (4 self) - Add to MetaCart
We present a maximum margin framework that clusters data using latent vari-ables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learn-ing, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conven-tional clustering approaches. 1
(Show Context)

Citation Context

...is problem. UpdatingW: The next step of learning is the optimization over the model parametersW (Eq. 5). The learning problem is non-convex and we use the the non-convex bundle optimization solver in =-=[3]-=-. In a nutshell, this method builds a piecewise quadratic approximation to the objective function of Eq. 5 by iteratively adding a linear cutting plane at the current optimum and updating the optimum....

Latent Pyramidal Regions for Recognizing Scenes

by Fereshteh Sadeghi
"... Abstract. In this paper we propose a simple but efficient image representation for solving the scene classification problem. Our new representation combines the benefits of spatial pyramid representation using nonlinear feature coding and latent Support Vector Machine (LSVM) to train a set of Latent ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
Abstract. In this paper we propose a simple but efficient image representation for solving the scene classification problem. Our new representation combines the benefits of spatial pyramid representation using nonlinear feature coding and latent Support Vector Machine (LSVM) to train a set of Latent Pyramidal Regions (LPR). Each of our LPRs captures a discriminative characteristic of the scenes and is trained by searching over all possible sub-windows of the images in a latent SVM training procedure. Each LPR is represented in a spatial pyramid and uses non-linear locality constraint coding for learning both shape and texture patterns of the scene. The final response of the LPRs form a single feature vector which we call the LPR representation and can be used for the classification task. We tested our proposed scene representation model in three datasets which contain a variety of scene categories (15-Scenes, UIUC-Sports and MIT-indoor). Our LPR representation obtains state-of-the-art results on all these datasets which shows that it can simultaneously model the global and local scene characteristics in a single framework and is general enough to be used for both indoor and outdoor scene classification. 1
(Show Context)

Citation Context

... we define Φ as zero vectors for the images of other categories (i.e. yj = −1) during the training. For solving the non-convex optimization problem of Eq. (3), we use nonconvex bundle optimization in =-=[12]-=- which is a recent variant of bundle methods for regularized risk minimization [13,14]. This method iterates between finding the best sub-window w and the optimal discriminative parameter vector θ, un...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University