Results 1 
3 of
3
G.: A practical transfer learning algorithm for face verification
 In: ICCV. (2013
"... Face verification involves determining whether a pair of facial images belongs to the same or different subjects. This problem can prove to be quite challenging in many important applications where labeled training data is scarce, e.g., family album photo organization software. Herein we propose a ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Face verification involves determining whether a pair of facial images belongs to the same or different subjects. This problem can prove to be quite challenging in many important applications where labeled training data is scarce, e.g., family album photo organization software. Herein we propose a principled transfer learning approach for merging plentiful sourcedomain data with limited samples from some target domain of interest to create a classifier that ideally performs nearly as well as if rich targetdomain data were present. Based upon a surprisingly simple generative Bayesian model, our approach combines a KLdivergencebased regularizer/prior with a robust likelihood function leading to a scalable implementation via the EM algorithm. As justification for our design choices, we later use principles from convex analysis to recast our algorithm as an equivalent structured rank minimization problem leading to a number of interesting insights related to solution structure and featuretransform invariance. These insights help to both explain the effectiveness of our algorithm as well as elucidate a wide variety of related Bayesian approaches. Experimental testing with challenging datasets validate the utility of the proposed algorithm. 1.
A Sparse Plus Low Rank Maximum Entropy Language Model
"... This work introduces a new maximum entropy language model that decomposes the model parameters into a low rank component that learns regularities in the training data and a sparse component that learns exceptions (e.g. multiword expressions). The low rank component corresponds to a continuousspace ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This work introduces a new maximum entropy language model that decomposes the model parameters into a low rank component that learns regularities in the training data and a sparse component that learns exceptions (e.g. multiword expressions). The low rank component corresponds to a continuousspace language model. This model generalizes the standard ℓ1regularized maximum entropy model, and has an efficient accelerated firstorder training algorithm. In conversational speech language modeling experiments, we see perplexity reductions
NearOptimal Smoothing of Structured Conditional Probability Matrices
"... Abstract Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning lowrank conditional probability matrices under expected KLrisk. This choice m ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning lowrank conditional probability matrices under expected KLrisk. This choice makes smoothing, that is the careful handling of lowprobability elements, paramount. We derive an iterative algorithm that extends classical nonnegative matrix factorization to naturally incorporate additive smoothing and prove that it converges to the stationary points of a penalized empirical risk. We then derive samplecomplexity bounds for the global minimzer of the penalized risk and show that it is within a small factor of the optimal sample complexity. This framework generalizes to more sophisticated smoothing techniques, including absolutediscounting.