Results 1  10
of
13
Finding Actors and Actions in Movies
"... This is a preliminary version accepted for publication at ICCV 2013. We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminativ ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
This is a preliminary version accepted for publication at ICCV 2013. We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weaklysupervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in the feature length movie Casablanca. 1.
A convex relaxation for weakly supervised classifiers
"... This paper introduces a general multiclass approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a blockcoordinate descent algorithm such as expectationmaximization (EM), which may lead to local minima. To avoid ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces a general multiclass approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a blockcoordinate descent algorithm such as expectationmaximization (EM), which may lead to local minima. To avoid this problem, we propose a cost function based on a convex relaxation of the softmax loss. We then propose an algorithm specifically designed to efficiently solve the corresponding semidefinite program (SDP). Empirically, our method compares favorably to standard ones on different datasets for multiple instance learning and semisupervised learning, as well as on clustering tasks. 1.
Weakly Supervised Action Labeling in Videos Under Ordering Constraints
"... Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discrimin ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies. 1
Concavity and Initialization for Unsupervised Dependency Grammar Induction
"... We examine models for unsupervised learning with concave loglikelihood functions. We begin with the most wellknown example, IBM Model 1 for word alignment (Brown et al., 1993), and study its properties, discussing why other models for unsupervised learning are so seldom concave. We then present co ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We examine models for unsupervised learning with concave loglikelihood functions. We begin with the most wellknown example, IBM Model 1 for word alignment (Brown et al., 1993), and study its properties, discussing why other models for unsupervised learning are so seldom concave. We then present concave models for dependency grammar induction and validate them experimentally. Despite their simplicity, we find that initializing the dependency model with valence using our concave models can approach state of the art grammar induction results for English and Chinese. 1
Supervised Exponential Family Principal Component Analysis via Convex Optimization
"... Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a samplebased approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1
Convex TwoLayer Modeling
"... Latent variable prediction models, such as multilayer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be pe ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Latent variable prediction models, such as multilayer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly nonconvex problem. Instead of proposing another local training method, we develop a convex relaxation of hiddenlayer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a nontrivial adaptive latent layer. The resulting methods are able to acquire twolayer models that cannot be represented by any singlelayer model over the same features, while improving training quality over local heuristics. 1
Multilabel Classification with Output Kernels
"... Abstract. Although multilabel classification has become an increasingly important problem in machine learning, current approaches remain restricted to learning in the original label space (or in a simple linear projection of the original label space). Instead, we propose to use kernels on output la ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Although multilabel classification has become an increasingly important problem in machine learning, current approaches remain restricted to learning in the original label space (or in a simple linear projection of the original label space). Instead, we propose to use kernels on output label vectors to significantly expand the forms of label dependence that can be captured. The main challenge is to reformulate standard multilabel losses to handle kernels between output vectors. We first demonstrate how a stateoftheart large margin loss for multilabel classification can be reformulated, exactly, to handle output kernels as well as input kernels. Importantly, the preimage problem for multilabel classification can be easily solved at test time, while the training procedure can still be simply expressed as a quadratic program in a dual parameter space. We then develop a projected gradient descent training procedure for this new formulation. Our empirical results demonstrate the efficacy of the proposed approach on complex image labeling tasks. 1
Convex relaxation of mixture regression with efficient algorithms
 In Advances in Neural Information Processing Systems
, 2010
"... We develop a convex relaxation of maximum a posteriori estimation of a mixture of regression models. Although our relaxation involves a semidefinite matrix variable, we reformulate the problem to eliminate the need for general semidefinite programming. In particular, we provide two reformulations t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We develop a convex relaxation of maximum a posteriori estimation of a mixture of regression models. Although our relaxation involves a semidefinite matrix variable, we reformulate the problem to eliminate the need for general semidefinite programming. In particular, we provide two reformulations that admit fast algorithms. The first is a maxmin spectral reformulation exploiting quasiNewton descent. The second is a minmin reformulation consisting of fast alternating steps of closedform updates. We evaluate the methods against ExpectationMaximization in a real problem of motion segmentation from video data. 1
WeaklySupervised Alignment of Video With Text
, 2015
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A Convex Alternative to IBM Model 2
"... The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the op ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are nonconvex, and hence have multiple local optima. In this paper we introduce a convex relaxation of IBM Model 2, and describe an optimization algorithm for the relaxation based on a subgradient method combined with exponentiatedgradient updates. Our approach gives the same level of alignment accuracy as IBM Model 2. 1