Results 1  10
of
15
Probabilistic models for supervised dictionary learning
 in CVPR
, 2010
"... Dictionary generation is a core technique of the bagofvisualwords (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. kmeans. However, the features obtained by such kind of dictionaries may not be optimal ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Dictionary generation is a core technique of the bagofvisualwords (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. kmeans. However, the features obtained by such kind of dictionaries may not be optimal for image classification. In this paper, we propose a probabilistic model for supervised dictionary learning (SDLM) which seamlessly combines an unsupervised model (a Gaussian Mixture Model) and a supervised model (a logistic regression model) in a probabilistic framework.
Distributed training of Largescale Logistic models
"... Regularized Multinomial Logistic regression has emerged as one of the most common methods for performing data classification and analysis. With the advent of largescale data it is common to find scenarios where the number of possible multinomial outcomes is large (in the order of thousands to tens ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Regularized Multinomial Logistic regression has emerged as one of the most common methods for performing data classification and analysis. With the advent of largescale data it is common to find scenarios where the number of possible multinomial outcomes is large (in the order of thousands to tens of thousands) and the dimensionality is high. In such cases, the computational cost of training logistic models or even simply iterating through all the model parameters is prohibitively expensive. In this paper, we propose a training method for largescale multinomial logistic models that breaks this bottleneck by enabling parallel optimization of the likelihood objective. Our experiments on largescale datasets showed an order of magnitude reduction in training time. 1.
Latent Classification Models for Binary Data
, 2009
"... One of the simplest, and yet most consistently wellperforming set of classifiers is the naïve Bayes models (a special class of Bayesian network models). However, these models rely on the (naïve) assumption that all the attributes used to describe an instance are conditionally independent given the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
One of the simplest, and yet most consistently wellperforming set of classifiers is the naïve Bayes models (a special class of Bayesian network models). However, these models rely on the (naïve) assumption that all the attributes used to describe an instance are conditionally independent given the class of that instance. To relax this independence assumption, we have in previous work proposed a family of models, called latent classification models (LCMs). LCMs are defined for continuous domains and generalize the naïve Bayes model by using latent variables to model classconditional dependencies between the attributes. In addition to providing good classification accuracy, the LCM model has several appealing properties, including a relatively small parameter space making it less susceptible to overfitting. In this paper we take a firststep towards generalizing LCMs to hybrid domains, by proposing an LCM model for domains with binary attributes. We present algorithms for learning the proposed model, and we describe a variational approximationbased inference procedure. Finally, we empirically compare the accuracy of the proposed model to the accuracy of other classifiers for a number of different domains, including the problem of recognizing symbols in black and white images.
Pattern Recognition
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
Ranking via Robust Binary Classification and Parallel Parameter Estimation in LargeScale Data
"... We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. The algorithm shows a very competitive performance on standard benchmark datasets against other representative a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. The algorithm shows a very competitive performance on standard benchmark datasets against other representative algorithms in the literature. On the other hand, in large scale problems where explicit feature vectors and scores are not given, our algorithm can be efficiently parallelized across a large number of machines; for a task that requires
Sparse additive text models with low rank background
 In Advances in Neural Information Processing Systems
, 2013
"... The sparse additive model for text modeling involves the sumofexp computing, whose cost is consuming for large scales. Moreover, the assumption of equal background across all classes/topics may be too strong. This paper extends to propose sparse additive model with low rank background (SAMLRB) a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The sparse additive model for text modeling involves the sumofexp computing, whose cost is consuming for large scales. Moreover, the assumption of equal background across all classes/topics may be too strong. This paper extends to propose sparse additive model with low rank background (SAMLRB) and obtains simple yet efficient estimation. Particularly, employing a double majorization bound, we approximate loglikelihood into a quadratic lowerbound without the logsumexp terms. The constraints of low rank and sparsity are then simply embodied by nuclear norm and ℓ1norm regularizers. Interestingly, we find that the optimization task of SAMLRB can be transformed into the same form as in Robust PCA. Consequently, parameters of supervised SAMLRB can be efficiently learned using an existing algorithm for Robust PCA based on accelerated proximal gradient. Besides the supervised case, we extend SAMLRB to favor unsupervised and multifaceted scenarios. Experiments on three real data demonstrate the effectiveness and efficiency of SAMLRB, compared with a few stateoftheart models. 1
DiversityPromoting Bayesian Learning of Latent Variable Models
"... Abstract In learning latent variable models (LVMs), it is important to effectively capture infrequent patterns and shrink model size without sacrificing modeling power. Various studies have been done to "diversify" a LVM, which aim to learn a diverse set of latent components in LVMs. Most ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In learning latent variable models (LVMs), it is important to effectively capture infrequent patterns and shrink model size without sacrificing modeling power. Various studies have been done to "diversify" a LVM, which aim to learn a diverse set of latent components in LVMs. Most existing studies fall into a frequentiststyle regularization framework, where the components are learned via point estimation. In this paper, we investigate how to "diversify" LVMs in the paradigm of Bayesian learning, which has advantages complementary to point estimation, such as alleviating overfitting via model averaging and quantifying uncertainty. We propose two approaches that have complementary advantages. One is to define diversitypromoting mutual angular priors which assign larger density to components with larger mutual angles based on Bayesian network and von MisesFisher distribution and use these priors to affect the posterior via Bayes rule. We develop two efficient approximate posterior inference algorithms based on variational inference and Markov chain Monte Carlo sampling. The other approach is to impose diversitypromoting regularization directly over the postdata distribution of components. These two methods are applied to the Bayesian mixture of experts model to encourage the "experts" to be diverse and experimental results demonstrate the effectiveness and efficiency of our methods.
OnevsEach Approximation to Softmax for Scalable Estimation of Probabilities
"... Abstract The softmax representation of probabilities for categorical variables plays a prominent role in modern machine learning with numerous applications in areas such as large scale classification, neural language modeling and recommendation systems. However, softmax estimation is very expensive ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The softmax representation of probabilities for categorical variables plays a prominent role in modern machine learning with numerous applications in areas such as large scale classification, neural language modeling and recommendation systems. However, softmax estimation is very expensive for large scale inference because of the high cost associated with computing the normalizing constant. Here, we introduce an efficient approximation to softmax probabilities which takes the form of a rigorous lower bound on the exact probability. This bound is expressed as a product over pairwise probabilities and it leads to scalable estimation based on stochastic optimization. It allows us to perform doubly stochastic estimation by subsampling both training instances and class labels. We show that the new bound has interesting theoretical properties and we demonstrate its use in classification problems.
unknown title
"... Bayesian embedding of cooccurrence data for querybased visualization ..."
(Show Context)